Rubinius vs. the benchmark from hell

October 16, 2010

This Mandelbrot benchmark really highlights Ruby’s weakest bit. Read the chart. Breathe. Fully internalize that the newest, hottest Ruby interpreter takes an hour and twenty minutes to do a simple render that C can do in 23 seconds.

Yup. Ruby is 200x slower than C here. The benchmark page doesn’t show Ruby 1.8.7 any more, but if you saw it, it would be more than 500x slower than C. On Stack Overflow, according to someone who sounds like they know their stuff, the big boost between 1.8 and 1.9 is that Ruby 1.9 can inline some basic math. I’ll take his word for it; I still can’t read the YARV internals worth beans.

Even with that boost, though, the performance is still bad; everything’s a method invocation, and it really, really hurts to watch this benchmark kick Ruby in its soft and tender bits.

I’ve been very interested lately by the different performance profile of Rubinius. I feel it has a lot philosophically in common with JRuby, with 100% less Oracle dependency. Especially once its hydra branch kills the GIL and introduces true multiprocessing, Rubinius could be something I get real use out of at work.

The best way for me to “benchmark” Rubinius is for me to just run my big Ruby apps on it. On balance for my real-world applications, Rubinius 1.1 seems to perform about the same as 1.8.7, occasionally reaching into the range of 1.9.2. (@headius will be happy to know that under heavy load, JRuby still beats the snot out of all of ’em. Different story.)

This is a big deal for me; I could actually make a lateral move from 1.8.7 or ree to rbx and nobody would even know! Neat.

Except in one case. I have an old branch of a Rails app, created in-house by another developer, which does a lot of computationally intense operations in Ruby code. It’s a geo app, and those slow computations made this branch utterly unusable. In the shipped version, I delegated all that to GeoServer, where it’s quite peppy. But for giggles, I thought I’d run the original branch under Rubinius.

Okay, I admit, it still sucked out loud. But it was doing its work in much less time than on 1.8.7. Which made me wonder, and made me get out a copy of this depressing Mandelbrot bench.

Rubinius, shockingly, beats 1.9.2 by a noticeable margin on this benchmark. At 1000 iterations, with the JIT very aggressively set, despite the horrifying spaghetti being dumped out by the compiler for this simple bench, 1.9.2 has the win.

Ruby 1.9.2:

rob@hurricane:~$ time ruby mandelbrot.rb 1000 > mandelbrot-yarv.out
real	0m19.387s
user	0m19.380s
sys	0m0.000s

vs. Rubinius:

rob@hurricane:~$ time rbx -Xjit.sync=true -Xjit.call_til_compile=1 mandelbrot.rb 1000 > mandelbrot-rbx.out
real	0m20.802s
user	0m19.360s
sys	0m1.120s

But when you take out the compilation time and run for longer at a steady state, Rubinius starts to win! Rubinius takes the lead, on my machine, somewhere between 2000 and 2500 iterations. On the full run (a 16000×16000 render), Rubinius crosses the finish line a whole minute ahead of 1.9.2. Whoa.

[Edit] Here are the final standings amongst the legitimate contenders (leaving 1.8.7 out, ’cause who has that kind of time?)

JRuby: 85m49.466s
1.9.2: 85m10.357s
Rubinius: 83m49.293s

[Further Edit] Read the comments below. Rubinius’s Brian Ford links to a gist which explains what’s slow, and optimizes the test script to cut the execution time substantially — almost in half. Haven’t tried it yet, but wow!

What’s potentially very neat is this: as near as I can read the JIT’s output as it flies by, Rubinius’s math *isn’t* being inlined. Looks like all ordinary object operations to me. And based on another quick looping 5-line bench I wrote, floating point divides in Rubinius continue to perform tragically, whereas they’re almost free in 1.9.2.

So I’m guessing — and hoping — that Ruby 1.9.2 and Rubinius have each optimized a completely different aspect of the problem. This bears further study! If the approaches can be combined, it would be great to get Ruby up into the range of merely crummy performance (like Python) on this benchmark, and out of the embarrassing basement.

If the improvements are indeed multiplicative, this would bode well for the ability to do some compute-intensive operations directly in Ruby from time to time when expedient, without always having to drop to a language with primitives and less OO noise. In practice, I see people getting away with little compute-heavy Python scripts all the time. I see nobody daring it in Ruby 1.8.7, for obvious reasons.

Three cheers for Rubinius and Evan Phoenix, anyway, for being the non-JVM Ruby to win this horrid bench for now.

From → Technology

19 Comments

Brian permalink

Hi,

Excited that you are interested in Rubinius performance. But see http://gist.github.com/632443. Drop into #rubinius channel on freenode any time. We’d love to chat about performance and how your app actually runs on Rubinius.

Cheers,
Brian

Reply
- rfc2616 permalink
  
  Thanks for the comment, Brian. And brilliant gist. I love it when simple optimizations make the VM do its thing better. I’ll post an edit.
  
  Believe me, I’m quite aware that C-vs-Ruby constitutes a full-on fruit mismatch. Perhaps the more relevant comparison is the Python-vs-Ruby one; I can’t think of any philosophical reason why Ruby must *substantially* underperform Python on a measurement of optimal code in each language, and I think your gist demonstrates that.
  
  Reply
- Юрий Соколов permalink
  
  Exciting thing about Rubinius – it pushes you to use more small methods instead of one big, to use more OO.
  For example: moving calculation to separate object speeds up by 10% bryan’s variant https://gist.github.com/1295228 !!! Even jruby is slower with methods.
  
  Reply
rofh permalink

+1 for Brian comment.

Also, see C and Python programs for example. They are parallelized, python with multiprocessing, and C with pthreads. This makes a huge difference too.

It’s not rare in shootout game to have benchmarks which just execute ‘gmp’ binding, insane! I propose to boycot these benchmarks in its entirety.

Reply
- rfc2616 permalink
  
  Yeah, I noticed one of the high-performing Lua scores in the shootout game needed a C extension (whatever they call it) for Lua; seems like cheating to me. Lua hardly needs the cheat; it’s fast enough interpreted, and even faster with luajit. In fact, ruby-lua-bridge is a nice way to speed up computation tasks in Ruby 1.8 without having to compile any native code.
  
  Reply
  - Mark Essel permalink
    
    Makes you wonder why luajit is SO much faster than Ruby (MRI/Rubinius/JRuby). Would be nice if they had similar benchmarks.
cbmeeks permalink

C faster than Ruby in CPU intensive benchmarks? Who’d a thunk it?

I’ve written many graphics engines in C and C++. Even 16 bit Turbo C++ and 32 bit gcc (DJGPP). I wouldn’t expect to re-write those in Ruby.

You know what they say about benchmarks….

Reply
- rfc2616 permalink
  
  The point of the post was that new and alternative Ruby implementations are getting much faster in places where they have traditionally been very weak. Since most of Rubinius is written *in* Ruby, in places where MRI uses C, I think it’s very interesting that it’s emerging as one of the more performant implementations.
  
  While it’s tempting to say “of course” you wouldn’t write a graphics engine in a language like Ruby, conventional wisdom says you shouldn’t use it to write the bulk of a compiler either. But Rubinius has done just that, and it’s performing surprisingly well vis-a-vis its C-coded competitors. And the code is far more readable and approachable. I think that’s a very notable result.
  
  Comparing benchmarks under a microscope is pointless, I agree. But when a benchmark demonstrates some really bad behavior in your language, an order of magnitude worse than other languages with similar features and tradeoffs (like Python), I think it reflects an issue worth fixing. Rubinius’s tweaked performance, and MagLev’s, are close to Python’s, showing it’s not a lost cause. As Rubinius appears to have a lot more performance headroom left to explore, and a very maintainable and extendable codebase, I hope it has a bright future as a Ruby implementation of choice.
  
  Reply
Charles Oliver Nutter permalink

FWIW, we’ve spent less time optimizing JRuby than just about any Ruby impl. In JRuby, blocks are still much heavier than 1.9, no dynamic calls are ever inlined, most dynamic calls from Java code are uncached, and so on. As a result, the majority of the JVM’s own optimizations don’t even apply to Ruby code right now.

Trivial tweaks to some parts of JRuby have sped up Ruby code as much as 5 times. We’re hoping to get more perf work done in 1.6, but already the various mandelbrot variations run in about 25% less time than 1.9.2, with lots of room to improve. Add to this the fact that the various JVM implementers are just now working to make dynamic languages and closures perform well…and JRuby’s future is as bright as any.

Reply
- rfc2616 permalink
  
  True, and hence, all my big apps are running on JRuby for now. I remember the first time I was able to run a JDK with invokedynamic support, I got a noticeable performance boost on some real-world JRuby applications. Unfortunately I have a non-zero, but dwindling, number of stopper bugs (non JRuby related) with newish OpenJDKs that keep me on Sun JDK 6 in production.
  
  Reply
David permalink

Interesting. As a casual Python user I’ve taken an interest in Ruby lately, mostly because of MacRuby. Lets just say things about Ruby rub me the wrong way but i could see myself adapting if Ruby became the rapid development language of choice on a Mac.

I’m actually surprised that Rubiness didn’t do better. But again my knowledge of the Ruby world is thin, is it a question of an immature jit?

Reply
Josh Cheek permalink

Why has no one complained about the test? I looked at the C, Java, Python, and Clojure implementations from that site, every single one of them was using concurrency. The Ruby implementation did it all in a single big loop. The problem isn’t that Ruby is slow, the problem is that the ruby implementation is at a disadvantage.

Reply
- rfc2616 permalink
  
  It would be nice to contribute a concurrent implementation to the shootout. It would highlight a number of other issues between different Ruby implementations, too. Concurrency on the original MRI was not a strength, and frequently would slow things down. I suspect that is why this code doesn’t use it. With green threads that don’t really leverage multiple cores, and a global interpreter lock, I think it would not improve things on 1.8 MRI in any environment.
  
  On the other Rubies, 1.9, JRuby and Rubinius for sure, and I would expect MagLev as well, I think concurrency could show an improvement on multiple cores. If you’re interested in writing it, I’d start with Brian’s gist – http://gist.github.com/632443 – which doesn’t seem to hurt any Rubies’ scores and gives a really big improvement on Rubinius.
  
  Reply
- karatedog permalink
  
  Maybe because the test says “x64 Ubuntu : Intel® Q6600® one core”? 🙂
  
  Reply
thk permalink

JFYI the only cases where Python outperforms Ruby 1.9 in these microbenchmarks are the mentioned mandelbrot fractal computation and pidigits, but while all Ruby code is “pure”, Python code “cheats” with the use of high-performance C-libraries !

mandelbrot.py uses ‘array’ module for efficient storing of static type values, ie. no use of standard lists

pidigits.py utilizes ‘gmpy’ external C/C++ multiprecision math library

In addition the comparison on multi-core machines is also flawed because Ruby code uses only one CPU whereas Python uses all four due to use of multiprocessing library. But Ruby could simply achieve the same behaviour by running processes on all cores, by use of parallel gem, for example.

To sum it up:

1) Examine and think twice before you enter the final judgement.
2) It’s incredible how did improved the performance in Ruby 1.9 running on YARV, while all the wide metaprogramming abilities were preserved at the same time.

Reply
- rfc2616 permalink
  
  Agreed on all counts. I’m mostly using Ruby 1.9.2 and JRuby in production now and am quite happy with both. Rubinius surprised me across the board by suddenly measuring up to both 1.9.2 and JRuby in performance and compatibility — not just in this benchmark, but in real applications.
  
  Reply
karatedog permalink

I just did a simple nested FOR loop (1..16000) in Ruby that does exactly nothing.
Run on a Lenovo T400, in a Virtualboxed environment (just to list the handicaps)

============
for i in 1..16000 do
for j in 1..16000 do
end
end
============
Running time:
real 0m48.693s
user 0m33.398s
sys 0m0.576s

—————————————

A WHILE loop is better (“succ” is faster than “+” as it won’t check for class change) :

============
i, j = 1, 1
while i < 16000 do
while j < 16000 do
j = j.succ
end
j = 1
i = i.succ
end
============
Running time:
real 0m26.641s
user 0m16.545s
sys 0m0.244s

This is still a joke.

Reply
Dominik Hamann permalink

I’m also a bit concerned about the code, especially since Ruby as one of only few languages has a native implementation of complex numbers math operations. Why would you transform the C code line for line instead of trying the native and probably much faster code?

Something like this:

require ‘complex’

def mandelbrot(a)
Array.new(50,a).inject(a) { |z,c| z*z + c }
end

(1.0).step(-1,-0.05) do |y|
(-2.0).step(0.5,0.0315) do |x|
print mandelbrot(Complex(x,y)).abs < 2 ? '*' : ' '
end
puts
end

Reply
Kurt Stephens permalink

Mandelbrot is a Float micro-benchmark. Any runtime that boxes up doubles is gonna suck. Type inference is difficult on Ruby semantics because they can be pulled out from underneath the running code. See the Ungar, et. al. Self papers on using runtime type statistics for method specialization. A difficulty is deoptimizing when the semantics change.

Reply