Auto-escaping HTML with Rails

Posted by Luke Francl
on Monday, January 28

One of the things I don’t like about Rails is that it doesn’t auto-escape HTML in user input. Forget one h call in your template and you’re screwed. Worse yet, before Rails 2.0, strip_tags and sanitize were flawed. Fortunately that’s been fixed. Django added auto-escaping even though it was a backwards incompatible change, but so far there doesn’t seem to be similar movement on the Rails front.

But I’m all about automating manual processes. So let’s fix this problem.

Sanitize before saving or before displaying? Or both?

Should you sanitize text before saving it or before displaying it?

It’s nice to not need to worry about doing anything extra in your views. However, if a field escapes your notice, you may be open for an attack.

I think your first line of defense should be model-level sanitization, but auto-escaping HTML is good backup. Doing both covers your bases at a cost of extra processing.

Introducing xss_terminate

xss_terminate is a plugin in that makes stripping and sanitizing HTML stupid-simple. It’s install and forget. And you can forget about forgetting to h() your output, because you won’t need to anymore. It’s based on acts_as_sanitized by Alex Payne but updated for Rails 2.0, and with some new features.

I like acts_as_sanitized but it’s not being maintained any more so Alex gave me the OK to take his code and do something different with it. Here’s what makes xss_terminate different:

  • It works with Rails 2.0.
  • It’s automatic. It is included with default options in ActiveReord::Base so all your models are sanitized. Period.
  • It works with migrations. Columns are fetched when model is saved, not when the class is loaded.
  • You can decide whether to sanitize or strip tags on a field-by-field basis instead of model-by-model.
  • HTML5lib support if Rails’s HTML parser isn’t doing it for you.

Here’s how you use it.

To install: script/plugin install http://xssterminate.googlecode.com/svn/trunk/xss_terminate

Strip HTML tags from all the fields in a model

1
2
class Article < ActiveRecord::Base
end

Done. All models have tags stripped by default.

Sanitize HTML from some fields

1
2
3
class Article < ActiveRecord::Base
  xss_terminate :sanitize => [:body]
end

Use HTML5lib to sanitize HTML from some fields

HTML5lib is a new library for parsing HTML for Python and Ruby. Its goal is to parse HTML like browsers do, so it’s very fault-tolerant. If you want to use it, gem install html5 and use the :html5lib_sanitize option. This is thanks to code by Jacques Distler.

1
2
3
class Article < ActiveRecord::Base
  xss_terminate :html5lib_sanitize => [:body]
end

But I don’t want to strip HTML at all from that field!

1
2
3
class Article < ActiveRecord::Base
  xss_terminate :except => [:title, :body]
end

Putting it all together

And of course, you can put these options together. Remember, fields are stripped of tags by default, so that’s assumed unless you override it.

1
2
3
class Article
  xss_terminate :except => [:author_name], :sanitize => [:title], :html5lib_sanitize => [:body]
end

Report bugs at the xss_terminate Google Code site.

Extra credit: Use Erubis

Erubis catches 80% of HTML escaping screw ups by making them impossible. You can use it in conjunction with xss_terminate or other XSS plugins to give yourself an extra layer of protection. (See our post on setting up Erubis with Rails 2.0.)

With Erubis, code like <%= "<script>alert('pwnd')</script>" %> can be auto-escaped.

However, all Rails helpers which generate HTML must be called with <%== %> so the HTML is not escaped. This leaves an opening for attacks like this:

<%== link_to user.name, "/some/url" %>

If user.name contains XSS you’re pwnd.

So while Erubis is a marked improvement over Erb it’s not a cure-all. That’s why I like to use both approaches.

Other approaches

There’s been a lot of discussion about Rails and XSS lately, so I’m hopeful that the situation will get better. Here’s a couple other XSS protection projects you can check out:

  • SafeERB – Throws exceptions if you try to display tainted strings. Call h() to untaint.
  • xss-shield – automatically h() strings unless marked as “safe”.
  • sanitize_params – strip HTML from your parameters before they hit your models.
  • AntiSamy – another whitelist-based approach (not available for Rails)

Also, check out Is your Rails App XSS Safe? and Never Untaint by Stu Halloway and Jacques Distler’s posts about making Instiki XSS-safe: XSS and XSS 2 (these are must read).

Fuzzing your database for fun and profit

Posted by Luke Francl
on Friday, January 25

Fuzz testing is throwing random data at your application and seeing what breaks. We don’t usually do that. But we often do need lots of semi-realistic data added our development database.

This helps you:

  • see how things will look when there’s more in the site.
  • nail down the indexes you’ll need (Queries that run fine with 10 rows of fixture data fall down on 10,000 rows of random data).

It’s possible to do this with fixtures and ERB but I find it tedious. Plus by using Active Record directly you can guarantee that the objects you’re inserting are valid.

First, create a new rake task in lib/tasks/fuzz.rake:

1
2
3
4
5
6
7
8
9
10
11
namespace :db do
  desc 'Insert some random posts'
  task :fuzz => :environment do
    if RAILS_ENV.downcase == "production"
      raise "You can't fuzz your production environment. Think of the children!"
    end
    
    Fuzz.execute(ENV['SIZE'].to_i)
    
  end
end

You’ll call this with rake db:fuzz SIZE=1000. You can actually put all the code in the rakefile, but it’s a little easier to manage to split it out into a separate class.

In lib/fuzz.rb, write something like this example, which finds a random user and adds a post from them to the system SIZE times. The fuzz script could do anything you want, though.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Fuzz
  ActiveRecord::Base.establish_connection(RAILS_ENV.to_sym)

  # This file location varies by OS. This is the Mac OS X location.
  # At 2.4M, you have plenty of RAM to read it all into memory!
  @@words = File.open("/usr/share/dict/words").collect do |line|
    line.strip
  end
  
  def self.execute(size)
    if size == 0 or size.nil?
      size = 100
    end

    ActiveRecord::Base.silence {
      User.transaction do
        size.times do
          user = User.find(:first, :order => "rand()")
          user.posts.create!(:body => random_words(rand(30))
        end
        puts "Created #{size} posts"
      end
    }
  end
  
  # provide a string with num random words in it.
  def self.random_words(num = 1)
    w = []
    num.times do
      w << @@words[rand(@@words.size)]
    end
    w.join(" ")
  end
  
end

Silencing the logger and using a transaction makes the code execute faster. Which can be a problem if you’re running 10,000 of these. Another thing you can do to speed things up is disable timestamps, but I’ve found that causes more trouble than it’s worth, because you often want to use those timestamps in your app!

Extra credit: While the data generated from random dictionary words is often hilarious, it’s not very realistic. Use Faker to create more realistic fake data and sometimes to randomize those non-required fields.

Managing Migrations

Posted by Luke Francl
on Monday, January 14

This is part of a series in which I am exploring the best practices for working together on a Rails team in advance of my acts_as_conference presentation. Earlier, I wrote about working together effectively and argued that bigger projects require better tools.

Migrations are great, but they are not without problems. Here’s some I’ve run into and how to fix or avoid them.

Migration Decay

You’ve been assigned to fix some bugs in a project your company built a while ago. You update the code and then run rake db:migrate.

rake aborted! (See full trace by running task with --trace)

D’oh.

I can this seemingly inevitable breakdown “migration decay.”

This goes against the general principles we’ve been talking about. It’s not automatic. It adds barriers when you add new members to a project, stressing the communications channels.

Note that this schema.rb definition is the authoritative source for your database schema. If you need to create the application database on another system, you should be using db:schema:load, not running all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations you’ll amass, the slower it’ll run and the greater likelihood for issues).

db/schema.rb

Idealisticly, I’m for using migrations and keeping them running from version 0 to version n. Migrations are cool because they can do it all: change your schema, load your seed data, and modify data on the server.

But as your models change, the old migrations don’t. And so it’s very likely that you’ll end up with migrations that stop working.

With enough work, you can probably keep them running—most of the time. But I’ve just been in too many situations where the migrations have failed. Worse is if they fail half-way in, because the migration number’s not incremented, but half of it has been applied. So the next run will fail, too, and then you have to fix the database manually. Bleh.

This has lead me to re-evaluate the use of schema.rb. Since schema.rb can’t load data, that means you need to separate migrations and data loading.

Re-organizing migrations

I’ve worked on a couple of projects where the existing migrations were consolidated into a single migration at the end of the development cycle.

This seems like too much trouble to me. Especially if we are giving up on migrations as the way to create the database. Who cares how many there are in that case? Plus you have to screw around with the server to re-set the schema_info table.

rake db:reset

On my latest project, I have been editing my migrations as needed and then re-creating the database with rake db:reset db:migrate.

I can only do this because it’s a small project and we haven’t deployed yet. But it does keep the migrations much cleaner and easier to understand.

Conflicting migrations

Anyone ever have this happen?

Chances are if you’ve worked on a project with more than a couple developers, you’re going to get conflicting migrations checked into your source control. And the more developers you have, the more time your whole team will waste cleaning up afterwards. Branches present another problem for migrations.

The root problem is that migrations layer another level of version control on top of your SCM. For a really insightful look at this thorny problem, I recommend reading the Django project’s Schema Evolution wiki page, particularly this part. (No, they haven’t solved it either.)

Some potential solutions:

  1. Use a SCM hook to prevent checking in conflicting migrations. There is a pre-commit hook that does this for Subversion. This wouldn’t work for branches, though.
  2. Use a plugin that extends migrations so they aren’t based (exclusively) on numbers. ELC Technologies has a plugin called Duplicate Migrations that allows this. There have also been a few other attempts, most notably Enhanced Migrations from Revolution Health and Independent Migrations by Courtenay.
  3. Write all the changes directly in schema.rb and let Auto-migrations take care of the database while your SCM keeps track of the merging.

I find the last option attractive, but I’m not sure I’d trust it for production use.

I really like the Duplicate Migrations plugin. Not only can developers concurrently change the database without problems, but development could continue on a maintenance branch without causing problems for the mainline. I plan to use it on my next big project.

I’ve also used the SCM hook approach with great success (within a single branch of development). There, the person who checks in their code “last” has to fix the problem, so it costs them some time. But it doesn’t kill everyone else’s time, too.

Your $0.02 here…

So, how do you deal with migrations? Let me know in the comments. Thanks!

Bigger projects require better tools

Posted by Luke Francl
on Sunday, January 13

We’ve seen how having more developers on a project increases the number of communication channels dramatically.

Project size also has a direct impact. To quote Code Complete:

Project size is easily the most significant determinant of effort, cost and schedule [for a software project]. People naturally assume that a system that is 10 times as large as another system will require something like 10 times as much effort to build. But the effort for a 1,000,000 LOC system is more than 10 times as large as the effort for a 100,000 LOC system.

For any non-trivial program, it is impossible to keep the whole thing in your head at once, no matter how smart you are. So in order to build software, we have to constantly battle against complexity. We split things into pieces; we automate processes so we don’t have to think about them; we document so we can remember later or delegate to others.

When it’s just me working on a project, I can track bugs on a piece of paper; there are never any problems with merges or conflicting migrations; and I know exactly who broke the test cases. As the number of developers increases, I need better tools. I need a real bug tracker, continuous integration to run the tests, and a way to deal with conflicting database changes.

What is the bearing of this on Rails? My goal is to look at how we can smooth over the bumpy edges of building a Rails project. These are the things that trip you up, that break your flow.

  • Managing migrations
  • Loading your initial data
  • Managing third-party code
  • Perhaps more…

Finally, I’ll try to answer the question: does it even matter? That is, will you get enough of a productivity boost to make these techniques worth it?