Fuzzing your database for fun and profit

Posted by Luke
on Friday, January 25

Fuzz testing is throwing random data at your application and seeing what breaks. We don’t usually do that. But we often do need lots of semi-realistic data added our development database.

This helps you:

  • see how things will look when there’s more in the site.
  • nail down the indexes you’ll need (Queries that run fine with 10 rows of fixture data fall down on 10,000 rows of random data).

It’s possible to do this with fixtures and ERB but I find it tedious. Plus by using Active Record directly you can guarantee that the objects you’re inserting are valid.

First, create a new rake task in lib/tasks/fuzz.rake:

1
2
3
4
5
6
7
8
9
10
11
namespace :db do
  desc 'Insert some random posts'
  task :fuzz => :environment do
    if RAILS_ENV.downcase == "production"
      raise "You can't fuzz your production environment. Think of the children!"
    end
    
    Fuzz.execute(ENV['SIZE'].to_i)
    
  end
end

You’ll call this with rake db:fuzz SIZE=1000. You can actually put all the code in the rakefile, but it’s a little easier to manage to split it out into a separate class.

In lib/fuzz.rb, write something like this example, which finds a random user and adds a post from them to the system SIZE times. The fuzz script could do anything you want, though.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Fuzz
  ActiveRecord::Base.establish_connection(RAILS_ENV.to_sym)

  # This file location varies by OS. This is the Mac OS X location.
  # At 2.4M, you have plenty of RAM to read it all into memory!
  @@words = File.open("/usr/share/dict/words").collect do |line|
    line.strip
  end
  
  def self.execute(size)
    if size == 0 or size.nil?
      size = 100
    end

    ActiveRecord::Base.silence {
      User.transaction do
        size.times do
          user = User.find(:first, :order => "rand()")
          user.posts.create!(:body => random_words(rand(30))
        end
        puts "Created #{size} posts"
      end
    }
  end
  
  # provide a string with num random words in it.
  def self.random_words(num = 1)
    w = []
    num.times do
      w << @@words[rand(@@words.size)]
    end
    w.join(" ")
  end
  
end

Silencing the logger and using a transaction makes the code execute faster. Which can be a problem if you’re running 10,000 of these. Another thing you can do to speed things up is disable timestamps, but I’ve found that causes more trouble than it’s worth, because you often want to use those timestamps in your app!

Extra credit: While the data generated from random dictionary words is often hilarious, it’s not very realistic. Use Faker to create more realistic fake data and sometimes to randomize those non-required fields.

Comments

Leave a response

  1. Brandon ArbiniJanuary 25, 2008 @ 04:02 PM

    We needed to something similar, but used a slightly different approach. Faker is decent, but we are working on something a little more configurable called Forgery. Hopefully you’ll find it useful.

  2. JamesJanuary 29, 2008 @ 12:55 PM

    A really useful function in fuzzing databases is the following:

    
    def with_probability(prob, &block)
      block.call if rand <= prob
    end
    

    With that, you can do things like

    
    u = User.create(...)
    with_probability(1/2.0) do
      stuff_that_only_some_users_should_have
    end
    
  3. JamesJanuary 29, 2008 @ 01:50 PM
    Or, even better:
    
    Object.class_eval do
      def with_probability(prob, &block)
        if rand <= prob
          block.call 
          return ProbabilisticDoer::Done.new
        else
          return ProbabilisticDoer::NotDone.new
        end
      end
    end
    
    module ProbabilisticDoer
      class NotDone
        def else_with_probability(prob, &block)
          with_probability(prob, &block)
        end
        def else(&block)
          with_probability(1, &block)
        end
      end
    
      class Done
        def else_with_probability(prob, &block)
          #do nothing
        end
        def else(&block)
          # do nothing
        end
      end
    end
    
    With that, you can do things like
    
    u = User.create(...)
    with_probability(9/10.0) do
      u.stuff_that_most_users_should_do
    end
    
    with_probability(0.01) do
      u.stuff_that_very_few_users_should_do
    end.else_with_probability(0.2) do
      u.stuff_that_some_but_none_of_the_above_users_should_do
    end.else do
      u.stuff_the_rest_of_the_users_should_do
    end
    
  4. Luke FranclJanuary 29, 2008 @ 11:23 PM

    Cool stuff guys!

    The with_probability code is similar to the sometimes pastie I linked to, but yours has more options.

  5. JamesJanuary 30, 2008 @ 09:11 PM

    I have put all my work together into a gem on RubyForge. Just

    sudo gem install nondeterminism
  6. Luke FranclJanuary 30, 2008 @ 09:41 PM

    Nice. I will give it a try the next time I am playing around with this.

  7. Chris MooreFebruary 07, 2008 @ 09:41 PM

    Here’s my version, with faker’s help.