The Details of Parallel Processing Are the Hardest Part

Mike Ash has an absolutely excellent post up today discussing the importance of the details when designing an application for parallel processing.

His post focuses on using Grand Central Dispatch (part of Mac OS X 10.6 “Snow Leopard”), but the core point is valid for any approach to multiprocessing: it’s not just a simple matter of queueing jobs. You have to consider the impact of those jobs on the system as a whole.

Running Sweepers from a Model

Oh, the pain. Over the last 24 hours I have fought an exhausting battle with Rails and the testing environment to do a couple of seemingly simple things:

  1. Expire cached pages and fragments from outside the context of a normal HTTP request
  2. Test it.

Testing, in particular, is particularly difficult because Rails does not offer any built-in mechanism to test an application’s caching, and I’ve had problems in the past with the only plug-in that does it (cache_test) it on Rails 2.x. I’ve released a new plug-in, called Banker, that provides assertions and the necessary support to test caching, including Shoulda macros.

The first item has come up for me more than once. Complex applications often have scheduled jobs that make changes to the database. If the application does any caching, there is a good chance that these jobs will affect content that is cached. The problem is that expiring caches from outside the context of a controller + request is a pain. Here is the solution:

Create your sweepers as you normally would. Then, either within test code or your script/runner code:

def setup_cache_sweepers(*sweepers)
  sweepers = sweepers.flatten
 
  ActiveRecord::Base.observers = sweepers
  ActiveRecord::Base.instantiate_observers
 
  returning ActionController::Base.new do |controller|
    controller.request = ActionController::TestRequest.new
    controller.request.host = URL_HOST
    controller.instance_eval do
      @url = ActionController::UrlRewriter.new(request, {})
    end
 
    sweepers.each do |sweeper|
      sweeper.instance.controller = controller
    end
  end
end

URL_HOST is a constant, defined in each environment file, with the host:port part of a URL. It is needed in order to generate URLs, and more importantly, fragment cache keys, outside the context of an HTTP request.

The method returns a controller. For unit test code, hang on to it, because it gives you access to named routes:

@controller.send(:users_path)

For code run from script/runner, nothing else is needed. You’ve already instantiated your sweepers and given them the controller instance, so anything you do to a model instance that generates a callback to the sweeper will use that controller, including calling cache expiration methods.

Manage vendor/rails with Git on a Subversion Project

Here is something I’ve been experimenting with over the last week or so, and it’s working out very nicely so far:

Use Git to manage vendor/rails when your project is using Subversion.

Here’s how:

First time freezing? Easy:

  1. cd vendor
  2. git clone git://github.com/rails/rails.git
  3. svn add -N rails; svn ps svn:ignore .git rails
  4. cd rails; git checkout v2.3.3.1 (or whatever version you want)
  5. svn add *

Switching to a different version of Rails is now as simple as:

  1. cd vendor/rails
  2. git checkout master
  3. git pull origin master
  4. git checkout whatever
  5. svn add `svn st | grep ^\? | cut -f7 -d" "`
  6. svn rm `svn st | grep ^\! | cut -f7 -d" "`
  7. svn commit

If you already have a vendor/rails and you want to try this technique out, you should first clone the Git repository to a temporary directory, git checkout the version that matches what is already in your vendor/rails, then copy (or move) the .git directory into vendor/rails. Add the svn:ignore property as above and you’re all set.

Here’s why I’m doing this: when a new Rails release comes out, some files are changed, some added and others deleted. Because Subversion litters a checked-out project with its .svn directories, you can’t just delete the entire thing and re-freeze without losing local patches (yes, I’ve had to do this) and history (which isn’t terribly important, but can be nice). Even ignoring those two reasons, completely deleting and adding vendor/rails will cause your Subversion repository to grow more than necessary (Rails 2.3.3 checks in at 35 MB).

Git, by putting everything in a top-level .git directory, makes itself easy for Subversion to ignore. Checking out a tag to switch to a different release is simple, and Git deletes files, unlike applying a patch, which truncates deleted files to 0 bytes, requiring a find to actually remove them.