Speed Up MRI Ruby 1.9

Making the rounds recently is a patch for MRI Ruby 1.9.3 from funny-falcon. It backports some changes coming in Ruby 2.0 that improve start-up time and method lookup. A second version of his patch also backports some changes to the garbage collector to make it more friendly to copy-on-write.

Testing this in my development environment yielded some impressive gains. As a simple test, I ran bin/rake routes three times. Here are the best of each set with various tweaks to the standard Ruby install:

Ruby Time
1.9.3-p327 14.67s baseline
1.9.3-p327
clang -march=native -Os
14.11s 96.1%
1.9.3-p327
clang -march=native -Os + falcon patch
8.74s 59.6%
1.9.3-p327
clang -march=native -Os + falcon patch + GC environment
6.80s 46.4%

Lower numbers in the third column are better. Falcon’s patch makes a big difference, as does setting two environment variables to tune when the garbage collector runs.

Optimization Flags

You may not know that neither rbenv nor RVM passes any optimization flags to the compiler when it builds Ruby.

A very simple way to get a small bump in performance is to do that:

$ CC="clang" CXX="clang++" CFLAGS="-march=native -Os" rbenv install 1.9.3-p327

-Os optimizes for size. On modern processors, this is often better than -O3 because a smaller code footprint gets more code into the CPU caches. (If you look at the default release optimization level for a new Xcode project, you’ll find that Apple agrees with me.)

-march=native tells the compiler to tune the code it produces for the CPU on your machine. This is good for situations like this, where you’re only running the Ruby binary on the machine that built it, but it shouldn’t be used for code you distribute to others. There is a catch, though: rbenv usually uses GCC, but the version of GCC distributed by Apple doesn’t know about -march=native. Clang does. A Clang-built Ruby knows to use Clang for gem native extensions, too. Even though Ruby itself isn’t C++, a gem might use it, which is why you must set both CC and CXX.

An optimized Ruby provides about a 4% speed-up. Not bad for a simple re-install, but we’re just getting started.

Applying Falcon’s Patch

You can follow the instructions at this gist to build a new Ruby 1.9.3-p327 with Falcon’s patch applied. You’ll have a new local version of 1.9.3-p327 called 1.9.3-p327-perf.

Export CC, CXX and CFLAGS before you run the Curl command to also build the patched Ruby with Clang.

$ export CC="clang"
$ export CXX="clang++"
$ export CFLAGS="-march=native -Os"
$ curl https://raw.github.com/gist/1688857/rbenv.sh | sh

This is good for about a 40% improvement.

Tune the Garbage Collector

Similar to Ruby Enterprise Edition, you can tune the garbage collector to run less frequently at the expense of larger Ruby processes. The argument goes that 10 faster, larger processes are better than 20 slower, smaller processes.

Set these environment variables in your ~/.bash_profile (zsh users, you know what to do):

export RUBY_GC_MALLOC_LIMIT=60000000
export RUBY_FREE_MIN=200000

That’s good for an almost 54% improvement over stock MRI 1.9.3.

Production?

I’m going to run this in my development environment for a while before I roll these changes, especially the Falcon patch, into production, but these early results are promising.

Update: -march=native appears to be a bad idea on a Linode VPS. The Ruby compile fails fairly quickly with this:

linking miniruby
make: *** [.rbconfig.time] Illegal instruction

Further investigation also shows that, on Debian 6 at least, CFLAGS is set to something sane, so there is not as great a need to override it.