Comparing lightweight threads
The Computer Language Benchmarks Game includes a benchmark that measures context switch performance. The entries can be classified into three categories:
- those that rely on lightweight threads, namely Haskell GHC, Erlang HiPE and Mozart/Oz. I don't know whether Smalltalk VisualWorks and Scala belong here too; their performance is far from the top three entries. Update (2008-03-25) According to Julian Morrison and Calum, Smalltalk and Scala have lightweight threads and (lightweight) actors on top of Java thread pools, respectively (see comments below).
- those that use POSIX threads, with or without actual parallelism, including most other entries
- Ruby, a category of its own (?): very expensive user-mode threads with no parallelism
The OCaml entry is amongst the fastest pthread-based ones, but still markedly slower than the top entry, by around an order of magnitude. The version I wrote some time ago, based on the Lwt cooperative lightweight thread library, is close in performance to GHC. Some analysis reveals interesting facts about GHC's concurrency support and Lwt.
Performance
Here are the figures I get on a dual-core AMD Athlon 64 X2 6000+ in 32 bit mode (Why 32 and not 64? Because it's faster in this benchmark; 64 bit pointers are heavy and we get nothing in return in this case.)
Update: I've also timed the Mozart/Oz version and GCC C+pthreads as a baseline.
| implementation | memory usage | time (s) |
| Haskell GHC 6.8.2 | 2680KB | 1.22 |
| OCaml ocamlopt 1024Kword minor heap | 5178KB | 1.85 |
| Haskell GHC 6.8.2 -threaded, -N1 | 2760KB | 1.9 |
| OCaml ocamlopt 256Kword minor heap | 2016KB | 2.05 |
| OCaml ocamlopt 64Kword minor heap | 1228KB | 3.06 |
| Erlang R12B-1 HiPE | 5996KB | 3.96 |
| Mozart/Oz | 3788KB | 4.10 |
| OCaml ocamlopt 32Kword minor heap | 970KB | 4.24 |
| Haskell GHC 6.8.2 -threaded, -N2 | 3300KB | 15.27 |
| GCC C (POSIX threads) | 4520KB | 28.7 |
The Haskell code was compiled with -O2; I used erlc +native +"{hipe, [o3]}" for Erlang and -O3 -fomit-frame-pointer -march=athlong for GCC.
GC overhead
The OCaml version is clearly GC bound, and performance increases as the minor heap is enlarged, decreasing the amount of GC work. Whereas with the default 256KB heap the Erlang program is slightly faster, when OCaml is allowed to use comparable amounts of memory, it is over twice as fast.
No significant change in the speed of the GHC version was observed when I changed the size of the allocation area with the -A RTS option. Clearly, Lwt causes far more allocation than Haskell's MVars (probably because the latter is written in C--, more on that below). The GC statistics obtained with Gc.print_stat for OCaml and -sstderr for GHC confirm that intuition:
OCaml 1024Kword stack OCaml 32Kword stack minor_words: 369560098 minor_words: 369589896 promoted_words: 4075042 promoted_words: 130355359 major_words: 4075554 major_words: 130355871 minor_collections: 352 minor_collections: 11278 major_collections: 176 major_collections: 5639 [...] [...]
The profiler indicates that over 30% of the time is spent in GC and caml_modify (which is used in the write barrier). This contrasts with GHC:
80,753,700 bytes allocated in the heap
1,984,712 bytes copied during GC (scavenged)
5,120 bytes copied during GC (not scavenged)
634,880 bytes maximum residency (1 sample(s))
154 collections in generation 0 ( 0.00s)
1 collections in generation 1 ( 0.00s)
[...]
%GC time 0.3% (1.2% elapsed)
[...]
Even though the OCaml version allocates over 1400MB where GHC takes but 77MB, and scavenges 8 times more data, it is not far speed-wise.
If anybody knows how to obtain GC statistics for Erlang, that might yield some interesting figures.
Implementation
The solution based on Lwt naturally resembles the Haskell version. Threads communicate via synchronised mutable variables ("MVars") which act as mutable cells with two basic operations, put and take, with the following semantics:
- a thread that tries to take a value from an empty MVar blocks until another thread puts a value
- a thread that tries to put a value into an non-empty MVar blocks until it is emptied
For the sake of simplicity, I have relaxed the semantics of put so that it doesn't block when the MVar isn't empty (this means the previous value would be lost). I call the resulting abstraction an MVar' (the MVar semantics require 6 more lines of code; the resulting executable is around 10% slower) in the code below*1
module MVar' =
struct
open Lwt
type 'a t = {
mutable v : 'a option;
rd : 'a Lwt.t list ref
}
let create () = { v = None; rd = ref [] }
let make x = { v = Some x; rd = ref [] }
let awaken l x = match !l with
[] -> ()
| hd::tl -> l := tl; wakeup hd x
let add_wait l = let w = wait () in l := w :: !l; w
let take t = match t.v with
Some x -> t.v <- None; return x
| None -> add_wait t.rd
let put t x = match !(t.rd) with
hd::tl -> awaken t.rd x
| [] -> t.v <- Some x
end
(I've used a list of blocked readers instead of a queue for conciseness; it makes no difference in the threadring benchmark.)
The implementation is very simple. An MVar' holds a value v and a queue of blocked readers (in the MVar, there's another queue for blocked writers). The take operation checks whether the MVar' is empty. If it is (i.e., t.v is None) the current thread is added to the queue; otherwise (when t.v matches Some x), the cell is emptied and the value returned.
Whereas take requires three to four lines of OCaml code, GHC's implementation of that primitive operation (ghc6-6.8.2/rts/PrimOps.cmm) takes around 180 lines of C--. This holds for the rest of the primitives too: Lwt is written in pure OCaml and all the non-inlinable GHC primops are written in C-- (PrimOps.cmm takes around 2200 LoCs). This explains in part their relative performance (another reason is that ocamlopt doesn't avoid extra closures in monadic code the way GHC does).
Last year's "Lightweight concurrency primitives for GHC" paper introduces a new approach where most of the concurrency support is moved to Haskell code that relies on a small C/Cmm runtime system, leading to less error-prone and more maintainable code at the cost of a small performance drop. It seems GHC is moving in that direction, but as of GHC 6.8.2 the shift hasn't been completed (putMVar is still written in Cmm).
Benchmark code
Here's the threadring benchmark per se, built atop the above MVar' abstraction:
open Lwt let ring = 503 let rec thread i l r = perform m <-- MVar'.take l; let () = if m == 1 then print_int i; MVar'.put r (m-1) in if m > 0 then thread i l r else return () let () = let a = MVar'.make (int_of_string Sys.argv.(1)) in let z, ts = Array.fold_left (fun (l, ts) i -> let r = MVar'.create () in let t = thread i l r in (r, t :: ts)) (a, []) (Array.init (ring - 1) ((+) 2)) in ignore_result (choose (thread 1 z a :: ts)) let () = at_exit (fun () -> Gc.print_stat stdout)
It uses the pa_monad syntax extension, if only to get rid of one bind.
Improvements to GHC's thread scaling - dons (2008-03-24 (Mon) 11:54:54)
Interesting post!
I've one question, and one note.Firstly, does Lwt support mulitple processors? Secondly, you might be interested in this ticket, about GHC's forkIO threads not scaling linearly,
http://hackage.haskell.org/trac/ghc/ticket/1589
a patch to fix this should appear in the 6.10 release in a few months.
-- Don S.
mfp 2008-03-24 (Mon) 12:43:01
According to Simon Marlow, it is a GC problem, and he reports a GC time of 76.4% after the fix. In threadring, I'm getting only 0.3% GC time, so the bug doesn't seem to be triggered by a mere 503 threads (in the bug ticket, 10^5 to 10^6 threads were used).
Unlike GHC with -threaded, Lwt doesn't support multiple processors. In this example, however, using more than one processor doesn't help (-N2 makes it an order of magnitude slower). In OCaml, you'd have to use several processes, maybe coordinated with JoCaml's Join Calculus. It seems to me that lightweight threads are more about clean system models than about concurrency parallelism, so using different things for different purposes doesn't feel too onerous. Nonetheless, I can see the value in using the same abstraction for "lightweight concurrency" and "actual parallelism". This is what happens in the case of JoCaml for parallel and distributed programming: you don't care whether a channel is local or remote. So, in the concurrency/parallelism range you have:
| language | (lightweight) concurrency | parallelism | distributed programming |
| Haskell | Control.Concurrent abstractions | X? | |
| OCaml | Lwt/Thread | JoCaml | |
What should stand for the above X, i.e. what does one use in Haskell for distributed programming? Is there some implementation of message-passing concurrency or better yet a Haskell extension with support for distributed programming? I've only seen O'Haskell, which seems incomplete and unmaintained ("O'Hugs, the O'Haskell interpreter New release January 31 2001!").
mfp 2008-03-24 (Mon) 12:50:41
Haskellwiki also points to Glasgow Distributed Haskell, which seems a bit old: it needs the GHC 5.00 sources and the last activity report, from 2006, says it's an alpha-release.
comparison - rektide (2008-03-24 (Mon) 17:39:56)
Other lightweight thread benchmarks that would be fun to see:
- Mozart/Oz
- Protothreads
- Mono Continuations
- Stackless Python
Heres a totally random link to a comparison of Stackless Python and Erlang.
mfp 2008-03-25 (Tue) 11:38:10
I've added the Mozart/Oz times for reference, and implemented yet another version using protothreads; I'll write about it later.
No Title - Julian Morrison (2008-03-25 (Tue) 06:09:50)
Smalltalk and Scala use lightweight threads that are implemented in the language (rather than in the underlying VM). That puts them intermediate between OS-heavy and VM-lightweight in speed, but I expect they scale up to 1000s of threads like VM-lightweight.
Calum 2008-03-25 (Tue) 11:36:41
I think that the Scala actors implementation is backed by a Java threadpool which is shared across actors, so it is quite similar to the lighter-weight systems. Technically there won't be thousands of "threads" though, there'll be a large number of actors accessing a smaller number of threads.
mfp 2008-03-25 (Tue) 11:42:18
In the case of Smalltalk, are lightweight threads implemented on top of continuations? Lwt is also implemented in the language, yet it is much closer in performance to GHC's C-- concurrency support.
cwp 2008-03-25 (Tue) 11:57:33
Smalltalk VMs typically provide a few primitives for preempting running threads, working with semaphores etc, and the rest is implemented in Smalltalk with calls to the primitives.
biapefaby 2008-07-07 (Mon) 22:05:28
EARN - $5.00 - CONTINOUS PAYMENT NOTIFICATIONS!
WE DO ALL THE ADVERTISING OF THIS PAGE AND YOU GET PAID!
Add your own http://www.alertpay.com/?RNMcRvvmrBvBCAPQjafUoQ%3d%3d - AlertPay account to our network and you will become one of many members who can earn money every day. NO REFERRING REQUIRED!
You will be given your unique URL so you can advertise it on your site and BOOST your Earnings even more. All ID's are displayed 100% RANDOMLY
There are only 3 SIMPLE steps to create your lifetime account, and get your payment receiving addresses into our Network! Pay a One Time $5.00 to your sponsor Pay a One Time $5.00 to your random member Pay a One Time admin fee ($5.00)
It is That Easy!
NO ADVERTISING REQUIRED!
You Do Not need to advertise your URL, we do this FOR you! However, you will receive your own unique url that you can promote to boost your incoming payments if you want to get even more traffic to your site.
THIS ONE-TIME INVESTMENT WILL PAY YOU OVER AND OVER!
http://indecentlyrich.com - indecentlyrich.com
smoogeenvesia 2008-07-09 (Wed) 16:08:53
Hello, Does someone know a good pay day loan service? I have read about http://www.paydayloanseo.com that is a good company, but i wan't to know if someone has used this service before and his oppinion. Looking forward for an answear. Thanks, smoogeenvesia
bessexhasty 2008-07-09 (Wed) 23:28:11
25 year-old Maverick Internet Marketer creates new Google software... which allows him and his fellow ‘partners in crime’ to quietly siphon up to $2,750 per day from Google, right under the noses of the establishment. He's already used this tool to siphon $64,400 from Google in just a few weeks - all with "one little ad"... And now he's giving you the chance to swipe his automated system - and generate a second income from home...
This story is already sending major shockwaves throughout the Internet Marketing Community.
So what’s all the fuss about?
This is the story of how a 25 year-old “underground” marketer went from zero to over a million dollars a year using the Internet - and how he turned several ordinary people into successful Internet Marketers making up to $100/day... in just 2 weeks.
Case Study No. 1: Tyson X pulled in an average of $145 a day with his first Adwords campaign. Case Study No. 2: Anthony Joshua has just had his first $500 net profit week And now, for the first time, you can use this unique system to reel in a second income from home
Want to see what all the fuss is about? Go here now:
Listen very carefully: this one email could change your life. I mean REALLY change your life. Chris X of “Day Job Killer” has finally decided to release his revolutionary new software system, which is fast becoming the talk of the Internet Marketing Industry. The really remarkable thing about this system isn't just it's *profitability* - although Chris has already made as much as $2,700 per day with one "little Google ad". The most remarkable thing isn't even the *power* of the sofware - despite the fact that this futuristic tool can "predict" conversion rates.
No, ...in fact, the most remarkable thing is its raw *simplicity*. Seriously, this push-button system is so easy to operate that even a 7 year-old child can make money using it. Now even the greenest of newbies can clone the moves of the "silent elite" who once hogged all the profits It sounded so good that I had to check it out immediately:
And you know what I discovered: it is like nothing I’ve ever seen before: I was blown away... This software ‘beast’ is called the Google Nemesis, and it is without a doubt the most profitable... and most powerful online money-making tool anywhere in the world. The creator, Chris X spent well over $60,000 and paid 2 full-time expert programmers 24/7, over a period of 9 months to develop this monster.
Okay, so what exactly does it do?
This system ‘clones’ the moves of the wealthiest and most successful underground affiliates... and then turbo-charges those moves and tells you exactly what to do, step-by-step, in order to copy their success and launch your own winning campaigns. Google Nemesis is the only tool in the world that lets you play the Google ‘Adwords’ game by your own rules... on your own terms, and in your own time. A tool that does everything – finds the hottest products, creates your websites, even *predicts* conversion rates – giving you everything you need to profit. This closely guarded tactic has made some savvy underground affiliates millions already– and they tried to keep it a secret from you – until NOW.
Okay, so who is Chris X (creator of Google Nemesis), and why do his closest friends think he’s gone mad for releasing this software to the public? Unless you’ve been living under a rock for the past couple of years, you’re bound to have heard of the fastest-selling Clickbank guide of all time, “Day Job Killer.” Well, Chris X is the maverick author of the guide. And since the release of that ebook 2 years ago, he has grown in stature and fame within Internet Marketing circles. However, he’s most famous for his ‘no-nonsense’ marketing: revealing the real money-making tactics that the gurus don’t want you to know. That’s the reason why he’s so respected in the game. That’s also the reason why “Day Job Killer” sold over 6,000 copies just on the day of launch alone, and went on to gross just under 1 Million dollars in just the first week of release. After the release of the Day Job Killer, many affiliates finally found the way to break free from their wretched day jobs and now enjoy true success online. However, there was still one major problem. Most of the buyers of Day Job Killer still found it tough to make use of the powerful techniques which it contained.
You see, Chris had discovered why most struggling affiliates fail to make money online ..they simply lacked the motivation, time or resources to make it all work for them, so Chris spent over 8 months developing software that addresses the reasons that 99% of affiliates fail to make a full-time living. Google Nemesis is as close as you'll get to push button profiteering: - simplicity: this dummy-proof software does everything for you... create cash-pulling websites using the proven templates, launch your campaign, and just follow the simple prompts to siphon off an easy second income from home - no more bloated Adwords costs - Nemesis tracks keywords automatically, using Chris's hidden three-step formula. Now you can "peek under the hood" of your promotions, and see where the money is... in 1/5 the time - and at as little as a quarter of the cost! - a proven, hands-off system: use the same website templates that Chris used to pump Google for $2,000 a day, and swipe the top profit opportunities from the elite super affiliates.
Now, you’re definitely going to want this Google Nemesis package for yourself, because... as you can see, nothing else on the planet comes even close to the money-making potential of this. How can you get your hands on this life-changing package? Well, you need to move fast - there is no guarantee that it will even stay on sale for long. This thing is bigger, better and far more effective at generating wealth online, than any ebook or software out there. Seriously - this is completely different to anything anyone has ever seen in the Internet Marketing arena. Now, Chris and the Day Job Killer team are generous people, but they’re not stupid. They KNOW from the results current users are already getting that... this thing is too valuable to ‘give away’ for a small price... and I can promise you that the price will definitely go up, very, very soon. So, I urge you to grab it now, while you can.
P.S. As you’re reading this…at least another 150,000 people just like you are seeing this email. But... not everyone is guaranteed to get a spot on the Google Nemesis exclusive members club…so, make sure you grab your place before it’s too late
Enemiaria 2008-07-22 (Tue) 01:37:04
LibertyReserveProfit.com is completely inevitmasterly aged give over Investment Program! We are masterly to pay our ordainors for as profuse years as they pick out to crumbs with us. Our attendance employs divers talented buyer groups based in joint States, Europe and Asia and this enqualifieds us to ordain into divers Markets far the world. suddenly while and intraday trading orders are our speciality. Our people arrange hundreds of trading orders every enterprise day and crowd receives stproficient hourly profit from these activities. Our yoke is focused on attaining undeviating returns on our patron's supplyments. This is done from head to foot having a crucial and balanced scheme for all capitals.
Plans: 108 % after 1 day 118 % after 2 days 128 % after 3 days
Registration:
http://libertyreserveprofit.org/?ref=hyip2008
GeDoPharm 2008-07-22 (Tue) 14:00:21
Nice site http://freeiq.com/cheapxanaxonline?fullbio=1 Buy Xanax Online http://freeiq.com/cheapxanaxonline?fullbio=1 24 Hour Delivery Buy Xanax
Enemiaria 2008-07-25 (Fri) 13:35:37
LibertyReserveProfit.com is assuredly fast aged bring in Investment Program! We are proficient to pay our supplyors for as numberless years as they opt to scraps with us. Our attendance employs divers qualified purchaser groups based in joint States, Europe and Asia and this enproficients us to ordain into various Markets thither the world. uncivil while and intraday trading orders are our speciality. Our people arise hundreds of trading orders every occupation day and crowd receives stqualified hourly profit from these activities. Our get is focused on attaining regular returns on our patron's supplyments. This is done under the aegis having a crucial and balanced develop for all capitals.
Plans: 108 % after 1 day 118 % after 2 days 128 % after 3 days
Registration:
http://libertyreserveprofit.org/?ref=erry
esolocoacydog 2008-07-28 (Mon) 04:51:51
LibertyReserveProfit.com is assuredly defend intoxication bring in Investment Program! We are qualified to pay our supplyors for as numberless years as they pick out to scraps with us. Our crowd employs disparate maestro buyer groups based in collective States, Europe and Asia and this enmasterlys us to supply into extraordinary Markets 'round the world. sparse provisions and intraday trading orders are our speciality. Our people arrange hundreds of trading orders every work day and crowd receives stmasterly hourly profit from these activities. Our yoke is focused on attaining undeviating returns on our customer's supplyments. This is done inclusive of having a critical and balanced drawing for all capitals.
Plans: 108 % after 1 day 118 % after 2 days 128 % after 3 days
Registration:
http://libertyreserveprofit.org/?ref=hyip2008
speedfakir 2008-08-12 (Tue) 19:37:41
czy to jest ciekawe? czy interesuje Cie to? http://www.futures.edu.pl/final-exam-on-stds-and-treatment.html http://www.futures.edu.pl/antigone-and-aristotle's-theory-on-dama.html http://www.futures.edu.pl/contact-tara-setmayer-republican.html http://www.futures.edu.pl/auburn-pa.html http://www.futures.edu.pl/black-hills-gold.html http://www.futures.edu.pl/glucofast.html http://www.futures.edu.pl/data.bls.gov-pdq-outside.jsp-survey-pc.html
Moim skromnym zdaniem ciekawa strona, ktora warto odwiedzic. http://www.creditloanstudents.com
Anonymous 2008-08-14 (Thr) 00:21:21
cheap viagra online
cheap viagra
pavlik 2008-08-16 (Sat) 01:25:09
Hi everyone. My name is Ray, from Utica, NY. I will be visiting Poland soon, and I am hoping to meet my Polish relatives. I also hope some people from here may help me in contacting my relatives before my visit. Thanks and looking forward to meeting some great people on here!
speedfakir 2008-08-16 (Sat) 18:21:38
czy to jest ciekawe? czy interesuje Cie to? http://www.creditloanstudents.com/alvin-ailey-tickets-new-york.html http://www.creditloanstudents.com/americacarnetwork.com.html http://www.creditloanstudents.com/24-gas-fireplace-logs.html http://www.creditloanstudents.com/2dayslimdown.html http://www.creditloanstudents.com/2007-expedition-ford.html http://www.creditloanstudents.com/105.7-the-point-satan's-favorite-hour-of-radio.html http://www.creditloanstudents.com/aargon.html
Moim skromnym zdaniem ciekawa strona, ktora warto odwiedzic. http://www.creditloanstudents.com
John Williams 2008-08-19 (Tue) 23:34:30
Pretty nice site, wants to see much more on it! :)
fkVwslZpVSVPMODO - Paris-Hilton.wikidot.com (2008-07-05 (Sat) 15:37:14)
Paris-Hilton.wikidot.com
LZieZNlzyfQhSaT - Paris-Hilton.wikidot.com (2008-07-05 (Sat) 15:37:19)
Paris-Hilton.wikidot.com
utDBbUOFtOXtSUxGGw - adult (2008-07-06 (Sun) 23:59:49)
http://freemovie.byethost31.com/adult/adult.html http://freemovie.byethost31.com/adult/adult-friend-finder.html http://freemovie.byethost31.com/adult/adult-personals.html http://freemovie.byethost31.com/adult/adult-dating.html http://freemovie.byethost31.com/adult/adult-club.html
WwKEfMsWBNzs - amateur (2008-07-06 (Sun) 23:59:56)
http://freemovie.byethost31.com/amateur/amateur.html http://freemovie.byethost31.com/amateur/amateur-sex.html http://freemovie.byethost31.com/amateur/amateur-porn.html http://freemovie.byethost31.com/amateur/allure-amateur.html http://freemovie.byethost31.com/amateur/amateur-exhib.html
QmDPUhHDcMKgxnIN - Vssteqfh (2008-07-24 (Thr) 17:46:04)
rvoiewrqjierve
*1 I use references instead of mutable fields for the write and reader queues because it simplifies the MVar code a bit.
- 1285 http://reddit.com/r/programming
- 93 http://anarchaia.org
- 58 http://reddit.com/?count=25&after=t3_6d6ru
- 58 http://reddit.com/r/programming/new
- 48 http://reddit.com/r/programming/info/6d838/comments
- 32 http://reddit.com/?count=25&after=t3_6d5vs
- 31 http://reddit.com/?count=25&after=t3_6d6da
- 30 http://reddit.com/?count=50&after=t3_6d5f2
- 25 http://planetruby.0x42.net
- 24 http://www.anarchaia.org
Keyword(s):[blog] [frontpage] [lightweight] [thread] [concurrency] [ocaml] [haskell] [ghc] [erlang] [ruby]
References:[The lightest lightweight threads, Protothreads]