Over the last couple of days I’ve been bringing up an isolated test environment for a customer’s new site. (As an aside, one of the great things about moving to an Intel Mac is that I can run nearly any OS I want under VMware Fusion at near native speeds. You can’t beat testing in an identical environment, and I can throw pretty respectable virtual hardware at it, too: up to a 4-core with gigs of memory. If only Apple would let me virtualize OS X client.)
I’m using httperf to simulate client load on the test server and quickly decided that --wsesslog looked like the best choice for simulating an actual browser’s effect on the server.
A problem: how to generate those session workloads? I certainly don’t want to do this by hand for even one page. I want to generate a hit on every file referenced by the target page, but ignore anything hosted elsewhere.
A solution:
#!/usr/bin/env ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'
if ARGV.length < 1
$stderr.puts "usage: #{$0} url
'url' must include the protocol prefix, e.g. http://"
exit 1
end
url = ARGV.shift
if url =~ %r{^(https?://)([-a-z0-9.]+(:\d+)?)(.*/)([^/]*)$}i
$protocol = $1
$host = $2
$document_dir = $4
document_url = $5
else
$stderr.puts 'Could not parse protocol and host from URL'
exit 1
end
doc = Hpricot(open(url))
def puts_link(uri)
return if uri.nil?
if uri =~ %r{^#{$protocol}#{$host}(.*)$}
puts " #{$1}"
elsif uri !~ %r{^https?://}
if uri =~ %r{^/}
puts " #{uri}"
else
puts " #{$document_dir}#{uri}"
end
end
end
puts "# httperf wsesslog for #{url} generated #{Time.now}"
puts
puts "#{$document_dir}#{document_url}"
(doc/"link[@rel='stylesheet']").each do |stylesheet|
puts_link stylesheet.attributes['href']
end
(doc/"style").each do |style|
style.inner_html.scan(/@import\s+(['"])([^\1]+)\1;/).each do |match|
puts_link match[1]
end
end
(doc/"script").each do |script|
puts_link script.attributes['src']
end
(doc/"img").each do |img|
puts_link img.attributes['src']
end |
#!/usr/bin/env ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'
if ARGV.length < 1
$stderr.puts "usage: #{$0} url
'url' must include the protocol prefix, e.g. http://"
exit 1
end
url = ARGV.shift
if url =~ %r{^(https?://)([-a-z0-9.]+(:\d+)?)(.*/)([^/]*)$}i
$protocol = $1
$host = $2
$document_dir = $4
document_url = $5
else
$stderr.puts 'Could not parse protocol and host from URL'
exit 1
end
doc = Hpricot(open(url))
def puts_link(uri)
return if uri.nil?
if uri =~ %r{^#{$protocol}#{$host}(.*)$}
puts " #{$1}"
elsif uri !~ %r{^https?://}
if uri =~ %r{^/}
puts " #{uri}"
else
puts " #{$document_dir}#{uri}"
end
end
end
puts "# httperf wsesslog for #{url} generated #{Time.now}"
puts
puts "#{$document_dir}#{document_url}"
(doc/"link[@rel='stylesheet']").each do |stylesheet|
puts_link stylesheet.attributes['href']
end
(doc/"style").each do |style|
style.inner_html.scan(/@import\s+(['"])([^\1]+)\1;/).each do |match|
puts_link match[1]
end
end
(doc/"script").each do |script|
puts_link script.attributes['src']
end
(doc/"img").each do |img|
puts_link img.attributes['src']
end