eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

MAIN

What's eigenclass.org?


2008-04-01 09:55 UTC The lightest lightweight threads, Protothreads

Last week, I used the Lwt cooperative lightweight thread library to implement a benchmark that measures context switch performance, determined that it was GC bound and timed it against comparable programs (i.e., the fastest implementations in the computer language benchmark games, which are all based on lightweight threads) and a C version that uses POSIX threads, obtaining these results:

implementationmemory usagetime (s)
Haskell GHC 6.8.22680KB1.22
OCaml ocamlopt 1024Kword minor heap5178KB1.85
Haskell GHC 6.8.2 -threaded, -N12760KB1.9
Erlang R12B-1 HiPE5996KB3.96
Mozart/Oz3788KB4.10
OCaml ocamlopt 32Kword minor heap970KB4.24
Haskell GHC 6.8.2 -threaded, -N23300KB15.27
GCC C (POSIX threads)4520KB28.7

Here are the figures I get for the C version I made with Protothreads:

Addendum This post wouldn't exist hadn't rektide asked for these results.

GCC C (Protothreads, optimum scheduling)220KB0.076s
GCC C (Protothreads, pessimum scheduling)220KB18.6s

(Read below for an explanation of the difference between optimum and pessimum scheduling policies.)

It is nearly 400 times faster than the C version with POSIX threads, and represents a one order of magnitude improvement over the other lightweight thread implementations. It also needs less memory. The performance is almost unbelievable. Where's the catch, ugly code maybe? Judge for yourself; here are the Protothreads and the POSIX threads C implementations head to head*1:

/* Protothreads, 0.076s*/                      /* POSIX threads, 28.7s */
#include <stdio.h>                             #include <stdio.h>
#include <stdlib.h>                            #include <stdlib.h>
#include "pt.h"                                #include <pthread.h>
#include "pt-sem.h"                            #include <limits.h>
                                               
#define NUM_THREADS 503                        #define NUM_THREADS 503
                                               
static struct pt thread_context[NUM_THREADS];  struct stack {
static int data[NUM_THREADS];                     char x[PTHREAD_STACK_MIN];
static struct pt_sem mutex[NUM_THREADS];       };
                                               
static                                         static pthread_mutex_t mutex[THREADS];
PT_THREAD(thread(struct pt *pt, int id))       static int data[THREADS];
{                                              static struct stack stacks[THREADS];
  static int token;                            
  static int r;                                static void* thread(void *num)
  PT_BEGIN(pt);                                {
                                                  int l = (int)num;
  while(1) {                                      int r = (l+1) % THREADS;
      PT_SEM_WAIT(pt, &mutex[id]);                int token;
      token = data[id];                        
      r = (id + 1) % NUM_THREADS;                 while(1) {
      if(token) {                                    pthread_mutex_lock(mutex + l);
          data[r] = token - 1;                       token = data[l];
          PT_SEM_SIGNAL(pt, &mutex[r]);              if (token) {
      } else {                                          data[r] = token - 1;
          printf("%d\n", id + 1);                       pthread_mutex_unlock(mutex + r);
          exit(0);                                   }
      }                                              else {
  }                                                     printf("%i\n", l+1);
  PT_END(pt);                                           exit(0);
}                                                    }
                                                  }
int                                            }
main(int argc, char *argv[])                   
{                                              int main(int argc, char **argv)
 int i;                                        {
                                                  int i;
 if(argc != 2) exit(255);                         pthread_t cthread;
 data[0] = atoi(argv[1]);                         pthread_attr_t stack_attr;
                                               
 for(i = 0; i < NUM_THREADS; i++) {               if (argc != 2) exit(255);
     PT_SEM_INIT(&mutex[i], 0);                   data[0] = atoi(argv[1]);
     PT_INIT(&thread_context[i]);              
 }                                                pthread_attr_init(&stack_attr);
                                               
 PT_SEM_INIT(&mutex[0], 1);                       for (i = 0; i < THREADS; i++) {
                                                     pthread_mutex_init(mutex + i, NULL);
 while(1) {                                          pthread_mutex_lock(mutex + i);
     for(i = 0; i < NUM_THREADS; i++)          
         thread(&thread_context[i], i);              pthread_attr_setstack(&stack_attr, &stacks[i], 
 }                                                                         sizeof(struct stack));
}                                                    pthread_create(&cthread, &stack_attr, thread, 
                                                                    (void*)i);
                                                  }
                                               
                                                  pthread_mutex_unlock(mutex + 0);
                                                  pthread_join(cthread, NULL);
                                               }
                                               

Even though the Protothreads code is similar to the one with pthreads, there are two important differences:

  • there is no thread-local data in the Protothreads version. r is recomputed on each iteration
  • the proto threads are scheduled manually by running all the corresponding functions in sequence

These dissimilarities give some insight into the nature of Protothreads. They are little more than small state machines whose state is stored in an opaque pt structure. Protothreads' implementation consists of a few preprocessor macros in headers, so the best way to see how they work is to examine the output of the preprocessor (gcc -E, slightly abridged and reformatted here):


Read more...

2008-03-31 11:39 UTC gibak 0.3.0 (backup tool using Git): OSX support, extended attributes, bugfixes

gibak is a backup tool based on git. Since gibak builds upon the infrastructure offered by Git, it shares its main strengths:

  • speed: recovering your data is faster that cp -a...
  • full revision history
  • space-efficient data store, with file compression and textual/binary deltas
  • efficient transport protocol to replicate the backup (faster than rsync)

gibak uses Git's hook system to save and restore the information Git doesn't track itself such as permissions, empty directories and optionally extended attributes and mtime fields.

You can read more about gibak here.

What's new in 0.3.0

  • OSX support
  • support for extended attributes both on Linux and OSX (it might work on other systems if their getxattr(2) interface matches Linux or OSX's).
  • a few bugfixes

Read README.upgrade if you used earlier versions of gibak and want to use extended attributes.

Getting it

The latest tarball as of 2008-03-31 is gibak-0.3.0.tar.gz.

You can always get the latest code with

git clone http://eigenclass.org/repos/git/gibak/.git/

The repository can be browsed at http://eigenclass.org/repos/gitweb

Thanks

  • Lee Marlow solved problems with submodule paths including spaces
  • sean provided most of the fixes required for OSX support

Some FAQs

(This will eventually be moved to a new node.)

Will the repository grow monotonically? Can I delete older history? What about large files?


Read more...

2008-03-24 17:11 UTC Comparing lightweight threads

The Computer Language Benchmarks Game includes a benchmark that measures context switch performance. The entries can be classified into three categories:

  1. those that rely on lightweight threads, namely Haskell GHC, Erlang HiPE and Mozart/Oz. I don't know whether Smalltalk VisualWorks and Scala belong here too; their performance is far from the top three entries. Update (2008-03-25) According to Julian Morrison and Calum, Smalltalk and Scala have lightweight threads and (lightweight) actors on top of Java thread pools, respectively (see comments below).
  2. those that use POSIX threads, with or without actual parallelism, including most other entries
  3. Ruby, a category of its own (?): very expensive user-mode threads with no parallelism

The OCaml entry is amongst the fastest pthread-based ones, but still markedly slower than the top entry, by around an order of magnitude. The version I wrote some time ago, based on the Lwt cooperative lightweight thread library, is close in performance to GHC. Some analysis reveals interesting facts about GHC's concurrency support and Lwt.

Performance

Here are the figures I get on a dual-core AMD Athlon 64 X2 6000+ in 32 bit mode (Why 32 and not 64? Because it's faster in this benchmark; 64 bit pointers are heavy and we get nothing in return in this case.)

Update: I've also timed the Mozart/Oz version and GCC C+pthreads as a baseline.

implementationmemory usagetime (s)
Haskell GHC 6.8.22680KB1.22
OCaml ocamlopt 1024Kword minor heap5178KB1.85
Haskell GHC 6.8.2 -threaded, -N12760KB1.9
OCaml ocamlopt 256Kword minor heap2016KB2.05
OCaml ocamlopt 64Kword minor heap1228KB3.06
Erlang R12B-1 HiPE5996KB3.96
Mozart/Oz3788KB4.10
OCaml ocamlopt 32Kword minor heap970KB4.24
Haskell GHC 6.8.2 -threaded, -N23300KB15.27
GCC C (POSIX threads)4520KB28.7

The Haskell code was compiled with -O2; I used erlc +native +"{hipe, [o3]}" for Erlang and -O3 -fomit-frame-pointer -march=athlong for GCC.

GC overhead

The OCaml version is clearly GC bound, and performance increases as the minor heap is enlarged, decreasing the amount of GC work. Whereas with the default 256KB heap the Erlang program is slightly faster, when OCaml is allowed to use comparable amounts of memory, it is over twice as fast.

speed.png
Read more...

2008-03-11 10:32 UTC Warm fuzzy things for random simulations

Let's talk about random experiments. The simplest one is tossing a coin, with outcomes "heads" and "tails". It's so elementary that we fully understand it intuitively once we're told that the coin is not loaded. There are three basic problems of interest:

  1. How to simulate the experiment with a computer and obtain results undistinguishable from the real thing?
  2. If we're given an outcome, can we compute its probability?
  3. Can we enumerate all possible outcomes and their probabilities?

In our first experiment, the answers are "rand 2" (if take e.g. "heads" = 0, "tails" = 1), "1/2" and "heads, tails, both with probability 1/2". Easy indeed. Now on to something harder: rolling a fair die. The answers are now "1 + rand(6)", "1/6", and "1..6, all with probability 1/6". Still trivial. Let's aim for the stars! What about rolling three dice? The outcomes are clearly 3..18, but what are the probabilities?

Intuition doesn't help much here --- we "can see" the possible outcomes, but most people cannot give the precise probabilities intuitively (I certainly can't; I can at most draw an approximate graph, but that's because I am used to convolutions). The first problem was how to simulate the experiment; this is by far the easiest one. This is what our dice rolling simulation looks like in Ruby:

  def roll_3d6
    d1 = rand(6)
    d2 = rand(6) 
    d3 = rand(6)
    1 + d1 + 1 + d2 + 1 + d3
  end

We can now simulate the experiment easily, but this doesn't help us with the harder problems: giving the probability of an outcome and doing so for all possible outcomes. Clearly, the above snippet has got all the information we need about this random simulation; it's indeed a precise definition. Wouldn't it be nice to have the computer solve the three problems of interest (simulation, distribution and expectation) for us if we give it the specification of the simulation? It turns out this can be done with a little code massaging:

  def roll_3d6(m)
    m.rand(6).in do |d1|
      m.rand(6).in do |d2|
        m.rand(6).in do |d3|
          m.ret(1+d1 + 1+d2 + 1+d3)
        end
      end
    end
  end

Given suitable Simulation, Distribution and Expectation classes, we can solve all three problems with the above code.

Simulating the experiment:

 p roll_3d6(Simulation.init).run    
 #  >> 9

All the possible outcomes and their probabilities:

 p roll_3d6(Distribution.init).run  
 # >>  
 # [[3, 0.00462962962962963], [4, 0.0138888888888889], [5, 0.0277777777777778], 
 # [6, 0.0462962962962963], [7, 0.0694444444444444], [8, 0.0972222222222222], 
 # [9, 0.115740740740741], [10, 0.125], [11, 0.125], [12, 0.115740740740741],
 # [13, 0.0972222222222222], [14, 0.0694444444444444], [15, 0.0462962962962963],
 # [16, 0.0277777777777778], [17, 0.0138888888888889], [18, 0.00462962962962963]]
roll_3d6.png

Getting the probability given an outcome:

 (1..21).each{|n| p roll_3d6(Expectation.init).run(n) }
 # >>
 # 0.0
 # 0.0
 # 0.00462962962962963
 # 0.0138888888888889
 # ...

What's really nice is that Simulation, Expectation and Distribution can be reused for other random experiments. Say you get an extra roll if any of the first 3 dice gave a 6:

  def roll_3d6_b(m)
    m.rand(6).in do |d1|
      m.rand(6).in do |d2|
        m.rand(6).in do |d3|
          if [d1, d2, d3].include?(5)
            m.rand(6).in{|d4| m.ret(1+d1 + 1+d2 + 1+d3 + 1+d4) }
          else
            m.ret(1+d1 + 1+d2 + 1+d3)
          end
        end
      end
    end
  end

Here's the distribution we get with roll_3d6_b(Distribution.init).run:

roll_3d6_b.png
Read more...

2008-03-05 14:54 UTC A better backup system based on Git

A fast, powerful backup system built upon Git and efficient, compact tools written in OCaml (faster than the C counterpart with 1/5th of the code :)

UPDATE (2008-03-31) gibak 0.3.0 released

Recent events have pushed me to get serious about backing up my data. I'm naturally inclined to use simple solutions over specialized backup systems, preferring something like rsync to a special-purpose tool. As far as "standard" tools go, however, git provides a very nice infrastructure that can be used to build your own system, to wit:

  • it is more space-efficient than most incremental backup schemes, since it does file compression and both textual *and* binary deltas (in particular, it's better than solutions relying on hardlinks or incremental backups à la tar/cpio)
  • its transport mechanism is more efficient than rsync's
  • it is fast: recoving your data is *faster* than cp -a
  • you keep the full revision history
  • powerful toolset with a rich vocabulary

I'm of course not the first one to think of git as the basis for a backup system. You can find many blog articles about this, but few people have gone beyond saying "stuff you data in a git repos" and tried to fill the holes and automate things properly. The most serious projects I've found are etckeeper and git-home-history. None of them gets it entirely right for my purposes, however (more on this below).

The tool I've written retains all the advantages from Git, and supplements it in some key areas:

  • metadata support
  • management of submodules (nested Git repositories)
  • automation of common operations; for instance, a commit consists of several steps:
    • determining if some files which were committed earlier are ignored now and removing them from the index
    • adding new and modified files to the index
    • registering new git submodules and copying them to a special area under .git
    • committing changes in the index
    • compaction and optimization of the repository

Only one command is needed in practice:

 gibak commit

I'm using it to save 2GB in over 200000 files, and it normally takes under 20 seconds to take a snapshot (AFAICS it scales linearly, so I expect to be able to backup 10GB in under 2 minutes...). The full power of the git toolset is available, so earlier versions can be restored with "git checkout", remote copies can be created/synchronized with git clone/push/fetch/pull, you can see what has changed with "git diff", and so on.

You can get the code with

 git clone http://eigenclass.org/repos/git/gibak/.git/

The repository can be browsed at http://eigenclass.org/repos/gitweb

The metadata issue

The major thing missing in Git when used as a backup tool is support for file metadata (mostly file permissions) and empty directories (git just ignores them). git-home-history doesn't handle them at all, and etckeeper relies on metastore to preserve a snapshot of the metadata (owner, group, permissions, mtime, etc.) in a .metastore file located at the top of the git repository (/etc in the case of etckeeper).

The problem with metastore is that it doesn't know about Git's file exclusion mechanisms (.gitignore), so it ends up storing the metadata of files that aren't actually to be saved. Even though metastore is quite small and simple (only some 1500 lines of C code), extending it to honor .gitignore files seemed fairly involved because the semantics is a bit tricky (subdirectories inherit patterns from their parents and there's more than one kind of pattern) and harder to express without higher-order functions and closures.

I decided to reimplement metastore in OCaml (I named it very unimaginatively ometastore) and I'm glad I did so. I implemented metastore's functionality in one fifth of the code (1500 lines of C vs. under 300 of OCaml), the resulting executable is faster by 10% without any optimization effort and needs less memory. Even with functionality like path prefix compression (which shrinks the snapshot by 50%) and Git-like semantics for ignored files, ometastore took 4 times less code than metastore (I've since optimized glob matching, making my directory traversal routine faster than git-ls-files', so it's up to 1/3rd of the size of metastore now).

The backup system


Read more...
Last modified:2005/11/08 03:28:40
Keyword(s):
References:[Opening up my hiki wiki: bliki.rb plugin]