Popular Sites with Apache mod_status Enabled

Earlier this week, Sucuri Security researcher Daniel Cid revealed that a very large number of popular sites expose their /server-status page to the world.

I was pretty sure the sites I run for myself and my customers were OK, but since paranoia is a good trait of a security-conscious techie, I double checked. Imagine my surprise when I found that one of my sites did the very same thing, as did one of my customer’s. Read More

Announcing mod_proxy_content 1.0

I’m pleased to announce the availability of mod_proxy_content, an Apache 2.x module that is essential for reverse proxying modern web content. It is a fork of mod_proxy_html 3.0.1, with bug fix patches through 3.1.2, enhanced to better handle CSS and Javascript content without the need to create messy regular expressions in the Apache configuration files. It does not depend on mod_xml2enc, and so has the same problems with non-ASCII/UTF-8 as mod_proxy_html without mod_xml2enc. Read More

Generate wsesslog Workloads for httperf

Over the last couple of days I’ve been bringing up an isolated test environment for a customer’s new site. (As an aside, one of the great things about moving to an Intel Mac is that I can run nearly any OS I want under VMware Fusion at near native speeds. You can’t beat testing in an identical environment, and I can throw pretty respectable virtual hardware at it, too: up to a 4-core with gigs of memory. If only Apple would let me virtualize OS X client.)

I’m using httperf to simulate client load on the test server and quickly decided that --wsesslog looked like the best choice for simulating an actual browser’s effect on the server.

A problem: how to generate those session workloads? I certainly don’t want to do this by hand for even one page. I want to generate a hit on every file referenced by the target page, but ignore anything hosted elsewhere.

A solution:

#!/usr/bin/env ruby
 
require 'rubygems'
require 'hpricot'
require 'open-uri'
 
if ARGV.length < 1
  $stderr.puts "usage: #{$0} url
 
  'url' must include the protocol prefix, e.g. http://"
  exit 1
end
 
url = ARGV.shift
if url =~ %r{^(https?://)([-a-z0-9.]+(:\d+)?)(.*/)([^/]*)$}i
  $protocol = $1
  $host = $2
  $document_dir = $4
  document_url = $5
else
  $stderr.puts 'Could not parse protocol and host from URL'
  exit 1
end
 
doc = Hpricot(open(url))
 
def puts_link(uri)
  return if uri.nil?
 
  if uri =~ %r{^#{$protocol}#{$host}(.*)$}
    puts "    #{$1}"
  elsif uri !~ %r{^https?://}
    if uri =~ %r{^/}
      puts "    #{uri}"
    else
      puts "    #{$document_dir}#{uri}"
    end
  end
end
 
puts "# httperf wsesslog for #{url} generated #{Time.now}"
puts
 
puts "#{$document_dir}#{document_url}"
 
(doc/"link[@rel='stylesheet']").each do |stylesheet|
  puts_link stylesheet.attributes['href']
end
 
(doc/"style").each do |style|
  style.inner_html.scan(/@import\s+(['"])([^\1]+)\1;/).each do |match|
    puts_link match[1]
  end
end
 
(doc/"script").each do |script|
  puts_link script.attributes['src']
end
 
(doc/"img").each do |img|
  puts_link img.attributes['src']
end