eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Quicksort erratum

A few days ago, I called the following Haskell function, reminiscent of Quicksort and considered the epitome of beautiful code by many, unusable for taking "always O ( n sup 2 ) space and time":

   qsort []  = []
   qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

Moreover, I referred to this version (using a C-like DSEL in Haskell) due to Lennart Augustsson as a usable one (as opposed to a O ( n sup 2 ) algorithm):

   qsortM :: forall i a m r arr .
            (MonadRef m r, Num i, Ix i, MArray arr a m, Ord a Bool) =>
            arr i a -> m (arr i a)
   qsortM ma = runE $ do
       (lb, ub) <- embed $ getBounds ma
       let mlb = pure0 lb
           mub = pure0 ub
       a <- liftArray ma
   
       let qsort' l r =
               if1 (r > l) $ do
                   i <- auto l
                   j <- auto (r+1)
                   let v = a[l] :: E m a
                       iLTj = i < (j :: E m i)
                   while iLTj $ do
                       while ((i += 1) < mub && a[i] < v)
                           skip
                       while (a[j -= 1] > v)
                           skip
                       if1 iLTj $ do
                           a[i] =:= a[j]
                   a[l] =:= a[j]
                   qsort' l (j-1)
                   qsort' i r
        
       qsort' mlb mub
       return ma

I was wrong. Instead of analyzing it with some care, I believed the O ( n sup 2 ) claims I read long ago in some now forgotten blog. I just saw the list concatenations and reacted without thinking.

Finding an upper bound for the number of comparisons in the best case is easy. In fact, while I'am at it, I'll find a bound for the number of cons operations, which is clearly greater than that of comparisons.

Noting the number of conses C(n) for a list of size n, for n > 3 (there are a few irregularities for n = 1 to 3 because no cons is needed when appending or prepending [] to another list), C(n) = p + C(p) + 1 + q + C(q) + p + 1, where the problem for a list of length n has been divided into two halves (lists of length p and q) plus a lone pivot element. p (resp. q) conses are needed to build the list passed to the recursive call to qsort, C(p) (C(q)) conses are needed for the subproblem, and finally p + 1 conses are needed to concatenate the resulting sorted sublists.

The problem is how to find p and q. For the best case analysis, p approx q approx n over 2. The n list can be divided into the desired three parts as follows:


n mark = left floor n over 2 right floor + left ceiling n over 2 right ceiling 
= left floor n over 2 right floor + 1 + left ceiling n over 2 right ceiling - 1
= left floor n over 2 right floor + 1 + left floor { n + 1 } over 2 right floor - 1

n lineup = left floor n over 2 right floor + 1 + left floor { n - 1 } over 2 right floor

Therefore,

C(n) = C left ( left floor n over 2 right floor right ) + C left ( left floor { n - 1 } over 2 right floor right ) + 2 left floor n over 2 right floor + left floor { n - 1 } over 2 right floor + 2

(There are two choices for the number of conses in the final list concatenation, differing by 1. The above is a conservative one, which will give an upper bound.)

Even after referring to Graham & Knuth & Patashnik, I saw no immediate way to solve this recurrence (please drop a comment if you have a closed-form solution). Fortunately, I just need an upper bound so I can use

C(n) <= D(n) = 2 D left ( n over 2 right ) + 3 left ( n over 2 right ) + 2

which is easily proved by induction, since all I've done is to remove the floor operations.

Then, using the master theorem

D(n) = THETA left ( n sup { log sub 2 2 } log n right )

C(n) = O left ( n log n right )

(Note the use of O for C(n) and THETA for D(n).)

The above program thus uses at most O(n log n) space and time. The upper bound is satisfactory enough; in fact, it's the same as for the real quicksort, and, furthermore, the lowest achievable bound for a comparison sort.

So far, I've assumed that the program evaluates everything eagerly, which might not be the case in Haskell, as it has a non-strict semantics (which is not exactly the same as being lazy). Let's see if GHC is really smart and can recognize that qsort(l) is just a permutation of l to skip the sort altogether when some operation that doesn't depend on the order is performed (this is quite close to proving that the qsort function actually sorts the list, and would be an amazing exploit indeed).

To that end, I first rewrite the function in OCaml; if GHC is able to do less work thanks to its non-strict semantics, the solution will scale better than with OCaml:

let rec qsort = function
    [] -> []
  | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs)

It's very similar to the Haskell one; the main difference is in the signs used with List.filter: ((>) x) means "x is greater than _", as opposed to "_ is greater than x" in Haskell.

The main function just creates a list of random integers, sums them up, and finally adds the numbers from the sorted list --- obtaining the same number, I hope (the sums are printed to stdout to make sure that computation is not omitted).

Here are the times I get:

times.png

The GHC binary takes nearly 800MB to sort a 3 million list, which explains why it suffers as n increases.

Well, this isn't... what was it, GHC 11? Being lazy does pay off if you try to get the nth (and in particular the first) element, though; using

 print (head $ qsort xs)

instead of sum,

   $ time ./min_sort 3000000
   4498748904877
   1
   1.252078
   
   real	0m3.542s
   user	0m3.296s
   sys	0m0.224s

(the real and user time include the time needed to build the list and perform the first sum). The theory says this can be O left ( sum from i=0 to { left ceiling log sub 2 n right ceiling } 2 sup i right ) = O (n). GHC seems to be evaluating only the first half of the list, as it should, resulting in quite an improvement :)

open Printf
open ExtList

let rec qsort = function
    [] -> []
  | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs)

let () =
  let n = int_of_string (Array.get Sys.argv 1) in
  let a = Array.to_list (Array.init n (fun _ -> Random.int n)) in
  let t0 = Sys.time () in
    printf "%d\n" (List.fold_left (+) 0 (qsort a));
    printf "Needed %8.5fs.\n" (Sys.time () -. t0);
    Gc.print_stat stdout

   module Main where
   
   import System.CPUTime
   import System.IO
   import System.Environment
   import System.Random
   
   qsort []  = []
   qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
   
   main = do
     strn:_ <- getArgs
     let n = (read strn) :: Int
     gen <- getStdGen
     let xs' = randomRs (1::Int, n) gen
     xs <- return (take n xs')
     print (sum xs) -- make sure xs is evaluated and stuff
     hFlush stdout
     t1 <- getCPUTime
     print (head $ qsort xs)
     t2 <- getCPUTime
     print (fromIntegral (t2 - t1) * 1e-12)


Comment - Chris (2008-07-15 (Tue) 06:54:11)

Very clear and well-written. This is the first time I've stumbled on eigenclass (thanks to reddit); and buried within the article here is:

"[O(nlogn) is] the lowest achievable bound for a comparison sort."

I don't doubt it, but do you know of an elegant proof of the above statement?

Sean 2008-07-15 (Tue) 07:14:01

I'm not sure there are proofs, but there are many experimental confirmations that O(n log n) is the lower-bound for comparison sorts. Other sorts can achieve O(n + k) or O(n) with limitations on the type of data sorted.

mfp 2008-07-15 (Tue) 07:29:20

It's an information-theoretical argument.

Here's the gist of it: there are n! permutations of a sequence of n elements, so the amount of information needed to pick a given permutation out of the n! possibilities is log n! bits (with log in base 2). Each comparison yields at most 1 bit (if both results, < and >=, are equiprobable), so we need log n! comparisons at least. Using Stirling's approximation, log n! =~ 1/ln(2) * (n ln n - n + 1), showing that a comparison sort is OMEGA(n log n) (bounded below asymptotically by Cn*log n for some C).

RegalCynicalSunrise 2008-07-23 (Wed) 15:46:37

A few comments on the last quicksort-related article made the same mistake of assuming the naive sort is O(n^2). It seems the real issue with the naive Haskell quicksort is space explosion because it's not an in-place sort. Also, if it were more strict (as the OCaml version is) it'd be faster, as this duder shows.

Name:
Comment:
TextFormattingRules
Preview:

No Title - Anonymous (2008-07-15 (Tue) 07:19:29)

@Chris:

see http://en.wikipedia.org/wiki/Comparison_sort#Number_of_comparisons_required_to_sort_a_list

mfp 2008-07-15 (Tue) 07:33:33

argh, should have looked for this before writing :)

Name:
Comment:
TextFormattingRules
Preview:

Thanks - Chris (2008-07-15 (Tue) 07:24:16)

Right, I could have just Wikied it x_x;

Thanks for the link, which demonstrates yet another great use of Stirling's Approximation :-D

Name:
Comment:
TextFormattingRules
Preview:

But that's still only the average case... - Reid (2008-07-15 (Tue) 09:20:11)

I think you realize this, but I don't think you make it clear that you are only proving this for the best case. Naive quicksort is still worst case n^2, because p!=q!=n/2. But that's okay, because that quicksort is totally slick.

mfp 2008-07-15 (Tue) 13:09:00

It's average-case analysis indeed, which is why mergesort is preferred for lists.

Perl switched to mergesort some time ago (to avoid quicksort's O(N^2) worst case), but now there's a pragma to choose the algorithm (it also shuffles the array before quicksort to avoid the O(N^2)).

Name:
Comment:
TextFormattingRules
Preview:

quicksort erratum erratum - Edward Kmett (2008-07-16 (Wed) 15:17:22)

The list concatenations really do impose an undue cost in the Haskell version. You can see the difference in the average case analysis by the pronounced upturn in the runtime of the Haskell quicksort in your performance chart. Counting conses is not sufficient, because you have to expand a non-constant number of function calls to reach each cons.

A succinct version of Haskell quicksort with actual quicksort average case performance and an explanation of why the naive quicksort fails to have it can be found here:

http://comonad.com/reader/2008/a-sort-of-difference/

mfp 2008-07-16 (Wed) 17:08:23

quicksort erratum erratum erratum? ;) O(n^2) just doesn't fit the observed behavior. (Edward is now trying to prove O(n log^2 n).)

Name:
Comment:
TextFormattingRules
Preview:

.GT. and .LT. - BMeph (2008-07-16 (Wed) 19:16:24)

Just as an aside, ((>) x) means the same thing in Haskell and OCaml - "x is greater than _". Haskell just lets you write it as a section.

Name:
Comment:
TextFormattingRules
Preview:

Qsort ocaml - Vijay Chakravarthy (2008-07-17 (Thr) 09:28:34)

Just a quick question:

Would'nt let rec qsort = function

   [x] -> [x]
 | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs)

be faster, since it avoids creation of a bunch of empty lists, and calls to cons?

mfp 2008-07-18 (Fri) 12:29:14

Don't forget you need the [] case to make sure the function works with empty lists :) (the compiler will complain otherwise, since the pattern wouldn't be exhaustive). Also, [x] -> [x] does cons (note that [] doesn't, it's a constant constructor), but this would cons less indeed (whether it's faster or not depends on whether the extra work in the pattern matching is compensated by the reduced allocation rate):

   let rec qsort = function
     [] -> []
   | [x] as x -> x
   | x::xs -> qsort (List.filter ((>) x) xs) @ (x :: qsort (List.filter ((<=) x) xs)

where I've also replaced the intermediate [x] @ ... with x ::, saving one cons per level (not that it matters at all really :).

The obvious way to speed this up further would be to use List.partition instead of two List.filter applications, hence scanning the list only once.

Name:
Comment:
TextFormattingRules
Preview:

Use this form to create a new top-level comment; for direct replies to existing comments, use the text entries you'll find below.

Name:
Subject:

TextFormattingRules
Preview: