About problem formulations and ordered permutations
The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skills. -- Albert Einstein
Here's an unremarkable example where the very description of the problem leads you in the wrong direction (on purpose?).
Here is an interesting coding challenge: write a counter function that counts from 1 to max but only returns numbers whose digits don’t repeat(...) Also display the biggest jump and the total count of numbers (...).
It suggests a brute-force solution, with max iterations and a predicate to verify if the counter satisfies the "no repeated digits criterion".
A vastly better approach (with 1126 times fewer iterations when considering numbers up to 10^10) becomes apparent if the problem is reworded as follows:
generate permutations of the digits '0' to '9' that do not start with '0' and whose numerical value is smaller than 'max', in increasing order
The "do not start with '0'" restriction is required to avoid generating a permutation twice (e.g. 12 and then 012), since a leading zero is ignored.
The easiest way to generate a permutation with m items from a set of n elements is to pick one element and prepend it to a (m-1) permutation of items from the resulting set of (n-1) elements. The "not starting with 0" criterion is fulfilled simply by picking the initial element from the digits different from zero and then putting zero back in the set used in the recursion.
Update (2008-07-10) A couple variants of the code below (still purely functional) which encode sets differently (using two lists, which won't surprise you at all if you've read Okasaki) are around 3 times faster and comparable in speed with Bob Lee's improved version.
This is expressed directly in OCaml as follows (you can find the full code of the OCaml and Ruby versions below):
module S = Set.Make(struct type t = int let compare = (-) end) let (--) set elm = S.remove elm set let permutations f zero digits len = let base = S.cardinal digits in let rec aux n digits = function 0 -> f n | len -> S.iter (fun s -> aux (n * base + s) (digits -- s) (len-1)) digits in S.iter (fun s -> aux s (digits -- s) (len - 1)) (digits -- zero)
The function is general in the sense that it will find unique permutations in any base (that is, by placing the numbers 10 to 15 in the initial set, a sequence with the digits '0'..'9' + 'a'..'f' can be generated with the same code).
Here's the translation to Ruby:
def aux(n, len, syms, &b) if len == 0 yield n return end syms.each{|s| aux(n * 10 + s, len - 1, syms - [s], &b) } end def compute_seq(zero, syms, len, &b) (syms - [zero]).each{|s| aux(s, len - 1, syms - [s], &b) } end
This version is not as general as the OCaml one, but that's not its only problem. This is how long it takes to find the 8877690 numbers under 10^10 with unique digits:
| Ruby 1.8.6 (base 10) | 73s |
| Ruby 1.9 (base 10) | 63s |
| OCaml (any base) | 0.640s |
As usual, Ruby is 100 times slower.
The OCaml version is more general than the strict minimum required by the original problem (it works with arbitrary bases), and thus presumedly slower than a more specific one. It doesn't fare badly against the fastest implementation I've found, an equivalent solution in Java that uses a bitmask to represent the sets:
| Java (base 10, bitset) | 0.680s without JVM startup overhead |
The Ruby solution uses Array#-, also know as "the poor man's set subtraction". Removing an element from an array is an O(n) operation. In the above-mentioned Java version, iterating over the remaining digits is O(all digits), instead of O(remaining ones). The OCaml version uses a set based on balanced trees where the relevant operations are logarithmic (it has a fairly large constant factor, so an integer set based on a bitset would be faster in this case). This means that the OCaml version would perform nearly as well with base-1000 permutations, as opposed to the other solutions which would become nearly 100 times slower.
Ironically, the OCaml stdlib is richer than Ruby's when it comes down to data structures. (The irony is that, while OCaml makes it very easy to build a data structure with good performance, in Ruby you are effectively stuck with the core classes implemented in C, which means essentially Array and Hash --- Ruby is so slow that algorithmic improvements hardly make up for the interpretation overhead.)
Here's the full code:
(* Copyright (C) 2008 Mauricio Fernandez <mfp@acm.org> http//eigenclass.org
Solution to the coding challenge at
http://beust.com/weblog/archives/000491.html *)
open Printf
module S = Set.Make(struct type t = int let compare = (-) end)
let (--) set elm = S.remove elm set
let permutations f zero digits len =
let base = S.cardinal digits in
let rec aux n digits = function
0 -> f n
| len -> S.iter (fun s -> aux (n * base + s) (digits -- s) (len-1)) digits
in S.iter (fun s -> aux s (digits -- s) (len - 1)) (digits -- zero)
let () =
let max = 10_000_000_000 in
let digits = List.fold_right S.add [0; 1; 2; 3; 4; 5; 6; 7; 8; 9] S.empty in
let count = ref 0 and prev = ref 0 and maxj = ref 0 and base = ref 0 in
let report () = printf "Found %d numbers under %d.\n" !count max;
printf "Max jump: %d (%d -- %d).\n" !maxj !base (!base + !maxj)
in try
for i = 1 to 10 do
permutations
(fun num ->
if num >= max then raise Exit;
(* printf "%d\n" num; *)
incr count;
let jump = num - !prev in
if jump > !maxj then (maxj := jump; base := !prev);
prev := num)
0 digits i
done;
report ()
with Exit -> report ()
# Copyright (C) 2008 Mauricio Fernandez <mfp@acm.org> http//eigenclass.org # Solution to the coding challenge at # http://beust.com/weblog/archives/000491.html def aux(n, len, syms, &b) if len == 0 yield n return end syms.each{|s| aux(n * 10 + s, len - 1, syms - [s], &b) } end def compute_seq(zero, syms, len, &b) (syms - [zero]).each{|s| aux(s, len - 1, syms - [s], &b) } end max = 10_000_000_000 digits = (0..9).to_a count = prev = maxj = base = 0 (1..10).each do |i| compute_seq(0, digits, i) do |x| break if x >= max # puts x count += 1 j = x - prev maxj, base = j, prev if j > maxj prev = x end end puts "Found #{count} numbers under #{max}." puts "Max jump: #{maxj} (#{base} -- #{base + maxj})."
No Title - Anonymous (2008-07-02 (Wed) 13:17:40)
I'd rephrase "permutations of the digits '0' to '9'" as "permutations from the digits ..." or "permutations of a subset of the digits...". I'm not sure about the former of the proposed; I don't know if "permutations from ..." has a well defined mathematical meaning. But I do know that the original (in my interpretation) would require each of the digits to be used.
Jacob Fugal 2008-07-02 (Wed) 13:22:26
Sorry, that above comment was from me. You can also disregard it. From http://en.wikipedia.org/wiki/Permutation:
In combinatorics, a permutation is usually understood to be a sequence containing each element from a finite set once, and only once...
However, there is also a traditional more general meaning of the term "permutation" used in combinatorics. In this more general sense, permutations are those sequences in which, as before, each element occurs at most once, but not all elements of the given set need to be used.
I was only familiar with the first definition; you are obviously using the second.
mfp 2008-07-02 (Wed) 13:29:42
I hesitated as I was writing it. I actually thought I was abusing the language since I was only familiar with the first definition :-), but "sequences with m elements taken from the original set where none is repeated" (the expression that came to mind) weighs a billion tons.
hmm I still suspect it's a convenient abuse of language.
The coding challenge - Cedric (2008-07-02 (Wed) 16:32:59)
Hi Maurizio,
I certainly didn't mean to mislead anyone when formulating this problem, and I'm not quite sure why you think I did. I did leave a few details out, but that was unintentional, and that certainly didn't stop a lot of enthusiast coders from trying to solve the problem.
My goal was just to compare as many languages as possible, but while I did get that, the exercise also turned into a very interesting optimizing challenge where pretty much only Java (and probably C and C++) are left standing. I am quite impressed by OCaml's performance, though, so thanks for providing your solution and your timings.
How about adding a comment on my blog pointing to this page so that future readers can see your solution as well?
-- Cedric
mfp 2008-07-02 (Wed) 17:55:28
My apologies, I didn't want to imply any malice on your part. I just thought out loud that maybe a broader family of solutions would be explored if the problem statement didn't hint towards a "good enough" (maybe as good as it gets?) one.
I thank you for coming up with the problem in the first place; I've learned a few things while solving it :-) (to wit, that Set is fast compared to bitsets even for very small cardinalities, and that I should avoid going directly for a specialized solution because the general one might be faster than expected).
No Title - David (2008-07-03 (Thr) 05:05:35)
This solution seems like a mistake, in that the trivial brute-force version can be expressed in a single line of ruby, (and OCaml with appropriate helper functions), and is simple for essentially any coder to understand. The permutation solution is undoubtedly faster, but much harder to program correctly and understand. If the speed is needed, then moving to the permutation is required, but until that time it seems overly complicated and wasteful of programmer time.
mfp 2008-07-03 (Thr) 07:09:08
The only problem with the 1-line brute-force solution in Ruby is that it would take 3 days to run (rough estimate based on how it scales up to 1e6), compared to 1 minute in Ruby or half a second in OCaml with the above implementations.
We have to draw a line somewhere. I think performance matters, even for a one-shot task, when the naïve approach takes over one day of CPU time.
You might argue that this very exercise is "overly complicated and wasteful of programmer time", but that's a different (meta-)discussion. If that line of thought is followed to its ultimate consequences, it could be said that nothing matters and that no single line of code (or any other human action whatsoever) has ever been worth doing. There's clearly little interest in going that far.
solution based on unordered permutations - Serge Sivkov (2008-07-11 (Fri) 01:02:49)
Hello, i've don't use ordered permutation and have solved that task as:
- generate all combinations C_m_n: m in 1..limit
- permute them to collect all possible numbers with
- sort them all filter out duplicate numbers (f.e. numbers with leading zero)
- convert resulted sequence to form like (1st,2nd),(2nd,3rd),...
- calculate distance for each pair and select max
#light
let rec distribute elt = function
| (hd :: tl) as list ->
(elt :: list) :: (List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ]
//generate unordered sequence of lists with unique digits (leading '0' is possible)
//all lists have same length (lim)
let gen lim =
let rec gen r k =
seq {
for i in k..9 do
let r' = i::r
if List.length r' = lim then yield r'
else yield! gen r' (i+1)
}
gen [] 0
let l2n l = List.fold_left (fun r n -> r*10+n) 0 l
let f (dist,len) (a,b) = max (b - a) dist, len+1
let u_seq lim =
seq {
for i in 1..lim do
for s in gen i do
yield! permute s
}
u_seq 10 |> Seq.map l2n |> Set.of_seq |> Seq.pairwise |> Seq.fold f (0,1);;
Real: 00:02:13.864, CPU: 00:02:12.265, GC gen0: 5276, gen1: 373, gen2: 25
val it : int * int = (20280249, 8873992)
mfp 2008-07-15 (Tue) 04:18:30
This is F#, right? The "light" syntax looks pretty good. There's one thing I'm not sure about, though: is
let a = ... in foo a; bar a
written
let a = foo a bar a
(i.e. with no semicolon)? If so, I'd feel a bit uneasy about the indentation of expresions that evaluate to unit, since it'd be easy to confound
for i in ... do
for j in ... do
foo i j
bar i j
with
for i in ... do
for j in ... do
foo i j
bar i j
Keyword(s):[blog] [ruby] [ocaml] [frontpage]
References: