Generate wsesslog Workloads for httperf
Over the last couple of days I’ve been bringing up an isolated test environment for a customer’s new site. (As an aside, one of the great things about moving to an Intel Mac is that I can run nearly any OS I want under VMware Fusion at near native speeds. You can’t beat testing in an identical environment, and I can throw pretty respectable virtual hardware at it, too: up to a 4-core with gigs of memory. If only Apple would let me virtualize OS X client.)
I’m using httperf to simulate client load on the test server and quickly decided that --wsesslog looked like the best choice for simulating an actual browser’s effect on the server.
A problem: how to generate those session workloads? I certainly don’t want to do this by hand for even one page. I want to generate a hit on every file referenced by the target page, but ignore anything hosted elsewhere.
A solution:
#!/usr/bin/env ruby require 'rubygems' require 'hpricot' require 'open-uri' if ARGV.length < 1 $stderr.puts "usage: #{$0} url 'url' must include the protocol prefix, e.g. http://" exit 1 end url = ARGV.shift if url =~ %r{^(https?://)([-a-z0-9.]+(:\d+)?)(.*/)([^/]*)$}i $protocol = $1 $host = $2 $document_dir = $4 document_url = $5 else $stderr.puts 'Could not parse protocol and host from URL' exit 1 end doc = Hpricot(open(url)) def puts_link(uri) return if uri.nil? if uri =~ %r{^#{$protocol}#{$host}(.*)$} puts " #{$1}" elsif uri !~ %r{^https?://} if uri =~ %r{^/} puts " #{uri}" else puts " #{$document_dir}#{uri}" end end end puts "# httperf wsesslog for #{url} generated #{Time.now}" puts puts "#{$document_dir}#{document_url}" (doc/"link[@rel='stylesheet']").each do |stylesheet| puts_link stylesheet.attributes['href'] end (doc/"style").each do |style| style.inner_html.scan(/@import\s+(['"])([^\1]+)\1;/).each do |match| puts_link match[1] end end (doc/"script").each do |script| puts_link script.attributes['src'] end (doc/"img").each do |img| puts_link img.attributes['src'] end |
how to use this script ? what are the arguments to this ruby program ?
It takes a single argument that is the URL for which to generate the workload.