Fuzzing your database for fun and profit

Posted by Luke
on Friday, January 25

Fuzz testing is throwing random data at your application and seeing what breaks. We don’t usually do that. But we often do need lots of semi-realistic data added our development database.

This helps you:

  • see how things will look when there’s more in the site.
  • nail down the indexes you’ll need (Queries that run fine with 10 rows of fixture data fall down on 10,000 rows of random data).

It’s possible to do this with fixtures and ERB but I find it tedious. Plus by using Active Record directly you can guarantee that the objects you’re inserting are valid.

First, create a new rake task in lib/tasks/fuzz.rake:

1
2
3
4
5
6
7
8
9
10
11
namespace :db do
  desc 'Insert some random posts'
  task :fuzz => :environment do
    if RAILS_ENV.downcase == "production"
      raise "You can't fuzz your production environment. Think of the children!"
    end
    
    Fuzz.execute(ENV['SIZE'].to_i)
    
  end
end

You’ll call this with rake db:fuzz SIZE=1000. You can actually put all the code in the rakefile, but it’s a little easier to manage to split it out into a separate class.

In lib/fuzz.rb, write something like this example, which finds a random user and adds a post from them to the system SIZE times. The fuzz script could do anything you want, though.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Fuzz
  ActiveRecord::Base.establish_connection(RAILS_ENV.to_sym)

  # This file location varies by OS. This is the Mac OS X location.
  # At 2.4M, you have plenty of RAM to read it all into memory!
  @@words = File.open("/usr/share/dict/words").collect do |line|
    line.strip
  end
  
  def self.execute(size)
    if size == 0 or size.nil?
      size = 100
    end

    ActiveRecord::Base.silence {
      User.transaction do
        size.times do
          user = User.find(:first, :order => "rand()")
          user.posts.create!(:body => random_words(rand(30))
        end
        puts "Created #{size} posts"
      end
    }
  end
  
  # provide a string with num random words in it.
  def self.random_words(num = 1)
    w = []
    num.times do
      w << @@words[rand(@@words.size)]
    end
    w.join(" ")
  end
  
end

Silencing the logger and using a transaction makes the code execute faster. Which can be a problem if you’re running 10,000 of these. Another thing you can do to speed things up is disable timestamps, but I’ve found that causes more trouble than it’s worth, because you often want to use those timestamps in your app!

Extra credit: While the data generated from random dictionary words is often hilarious, it’s not very realistic. Use Faker to create more realistic fake data and sometimes to randomize those non-required fields.