Working with others: Best practices for Rails teams

Posted by Luke Francl
on Friday, February 08

This is the HTML version of the handout from my acts_as_conference presentation. If and when they post the audio of the talk, I’ll upload my slides; they wouldn’t make much sense without it. —Luke

Software is hard. Why? Fred Brooks separated software development into essence (the conceptual modeling) and accident (actually building it). As building software has gotten easier due to better tools, the essence has remained difficult. One of the best things about Rails is that it lets fewer people get more done. This helps because adding more people to a project doesn’t scale. But Rails teams still run into problems. What are they and how can we make Rails development easier?

Migrations

Having a way to evolve your database schema baked into the framework is a huge advantage. But they are also a source of pain for Rails teams.

Migration conflicts happen when two people check in a migration with the same number, and then everyone on the team has to manually fix their database. You can fix this with a pre-commit hook or use a plugin that allows duplicate migrations.

I call the seemingly inevitable tendency of migrations to stop working migration decay. You can fight it, if you try hard enough (continuous integration is your friend here). Or you can give up:

Note that this schema.rb definition is the authoritative source for your database schema. If you need to create the application database on another system, you should be using db:schema:load, not running all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations you’ll amass, the slower it’ll run and the greater likelihood for issues). —db/schema.rb

Seed data

Migrations seem great for loading data, because they run automatically. However, touching your models in your migrations increases the chances that they’ll break. Whatever you do, don’t use db:fixtures:load for this—that’s for your tests.

Solution: use a rake task that loads seed data. Jeffrey Allan Hardy wrote a nice task to do this using fixtures. I don’t like fixtures for loading seed data because they don’t validate data, so I use db-populate by Josh Knowles , along with ActiveRecord::Base.create_or_update:

1
2
3
4
5
6
7
8
9
def self.create_or_update(options = {})
  id = options.delete(:id)
  record = find_by_id(id) || new
  record.id = id
  record.attributes = options
  record.save!
    
  record
end

Managing third-party code

The best way to reduce the difficulty of writing code is to not write it. Ruby’s great library of gems and the hundreds of Rails plugins help us write less code. However, managing third-party code is a pain. If a developer installs a gem, everyone else needs it. We used to have problems with this, but now we vendor everything. Dr. Nic’s gemsonrails makes vendoring gems easy—except for gems that must be natively compiled.

Don’t install plugins with svn:externals unless you want to rely on Joe Bob’s Random Subversion Server to be up when you’re deploying. Don’t edit plugins, unless you’ve got a good SCM or use piston. Even then, it may be better to keep your changes as monkey patches in the lib directory.

Security should be on by default

In some areas, you have to screw up to get insecure code. For example, it’s easier to use secure SQL queries than not. And cross-site request forgery protection is baked in.

Not so with cross-site scripting (XSS). You have to remember to use h() in your views. If you forget once, your site could get hacked. My xss_terminate plugin solves this by stripping HTML from all strings when the model is saved (you can override this for attributes that need HTML). We also use Erubis to auto-escape HTML in views.

(Other XSS plugins include: Cross Site Sniper sanitize_params, SafeERB, and xss-shield. If you don’t like xss_terminate, try one of the others!)

Mass assignment code like LineItem.new(params[:line_item]) may set attributes (like total_price) you don’t anticipate—and a malicious user could end up getting charged $0.01 for a MacBook Pro. Protect your attributes with attr_protected. Better yet, use attr_accessible to create a white list of explicitly allowed attributes. Best yet, do this for all models by default. We added this code in an initializer to protect all the attributes:

ActiveRecord::Base.send(:write_inheritable_attribute, "attr_accessible", [])

Then we set attr_accessible to override the protection. This will wreak havoc on an existing code base, but I think it’s a good policy for new sites.

Source control management: the heart of your project

Code is communication, and the way code has changed over time is communication, too. I rely on annotate, log, and diff extensively when figuring out why code works the way it does. To help your team members, you must write informative log messages: what changed, why, and the bug number (if applicable). Commit atomic changes; don’t patch bomb a bunch of unrelated code.

Bug tracking

The bigger your team, the more important a good bug tracking system is. What makes a good bug tracker? The features I look for are: workflow with open, closed, and resolved states; e-mail integration; SCM integration; and shared saved searches. The most important feature is ease of use, because if it’s not easy, people won’t use it.

Continuous integration

Continuous integration (CI) builds your software every time someone makes a change. CI runs tests, but more than that it ties everything together. It simulates a deployment: checks out your code, creates the database and loads the seed data, and ensures all the necessary libraries are there. If you broke something, CI will let you know right away. CI takes awhile to set up and may be overkill for small teams, but on larger teams, CI is especially helpful.

Does it matter?

The practices above smooth over rough patches, automate processes, and manage communication. For the most part, they attack the accident of Rails development. But the majority of what makes software hard is figuring out what to build: the essence. Fred Brooks argues that to make real improvements in software productivity, we must attack the essence. To get at the essence, do what you can to limit complexity: use existing open source or packaged software; do less (a la 37Signals); split complicated projects into smaller pieces.

Iterative development

Iterative development is the most important software development technique. You can write tests all day long, but if you’re building the wrong thing, they won’t help you. With iterative development, your understanding of what you are trying to build grows with time and feedback from the customer (if you aren’t getting regular feedback, it’s not iterative). There is a difference between incremental and iterative. Iteration is the process of continuous refinement; incremental is building in stages. We try to do both: build smaller pieces, and iteratively refine the software.

Loading seed data

Posted by Luke Francl
on Thursday, January 31

At acts_as_conference next week (there’s still room to register) I’m going to be talking about challenges facing Rails teams. Today, I’d like to talk about loading your application’s seed data.

Seed data?

Seed data is anything that must be loaded for an application to work properly. An application needs its seed data loaded in order to run in development, test, and production.

Examples include everything from an initial administrator account to small enumerations to huge amounts of data (one example of seed data given by a developer on the Ruby Users of Minnesota included every airport in the world).

Seed data is mostly unchanging. It typically won’t be edited in your application. But requirements can and do change, so seed data may need to be reloaded on deployed applications.

The ideal solution would be automatic: you shouldn’t have to think about it. When you check out the code and start up your app, it should be ready. It should provide data integrity: the created records should pass your validations. And it should be easy to update your seed data.

Migrations

Since migrations are just Ruby code, they can be used to initialize data in the up method. This is demonstrated in the Rails documentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class AddSystemSettings < ActiveRecord::Migration
  def self.up
    create_table :system_settings do |t|
      t.string  :name
      t.string  :label
      t.text  :value
      t.string  :type
      t.integer  :position
    end

    SystemSetting.create :name => "notice", :label => "Use notice?", :value => 1
  end

  def self.down
    drop_table :system_settings
  end
end

Using migrations is attractive because they get run automatically.

However, they have some downsides. Adding or changing data is troubling. Adding a new migration seems annoying. But going back into your old migrations to change your data won’t work either.

The biggest problem with using migrations to load seed data is “migration decay.” The more migrations you have, the less likely the older ones are to work. If your migrations load data, they are more likely to break as your models change.

Furthermore, the movement in the Rails community is that schema.rb is the authoritative source of your DB schema, and that new databases should be created using that:

Note that this schema.rb definition is the authoritative source for your database schema. If you need to create the application database on another system, you should be using db:schema:load, not running all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations you’ll amass, the slower it’ll run and the greater likelihood for issues).

That means no data loading migrations can be run.

Fixtures

At first glance, fixtures seem well suited for loading data. And because of that, a lot of projects go down the primrose path of using them—usually with poor results.

There are two ways to use fixtures to load seed data.

First, simply use test fixtures with rake db:fixtures:load. This is almost certainly a mistake. Your test fixtures will contain data not necessary for your application.

Second, create a separate set of fixtures, unrelated to your tests, and load those. Jeffery Allan Hardy has a good post about how to use fixtures to load seed data. This is better, but I don’t like fixtures because they don’t validate data. It’s way too easy to end up with broken models.

One caveat about seed data, fixtures, and tests: If you use fixtures for tests, your data is deleted and the fixtures loaded. So your fixed seed data needs to be duplicated in the fixtures.

Fixture scenario builder

I haven’t used this one myself, but a number of people on the RUM list recommended using Chris Wanstrath’s Fixture Scenario Builder as a way to use fixtures without sucking (see above).

The Fixture Scenario Builder, uh, builds on Fixture Scenarios, letting you define them in Ruby (so they’re valid) and then generating fixture files for loading. Most people use this for test cases, but it can be used to load your initial data as well.

ActiveRecord::Base loader

If only there were some way to create records that were valid. Oh wait, ActiveRecord does this. Why not write a task that loads the seed data with ActiveRecord?

You’d have to make sure this gets run whenever you set up a new application. Josh Knowles has created db-populate to facilitate this approach. It provides a db:populate rake task that will run Ruby files in the db/fixtures directory.

Here’s a helper method that makes it easy to create or update records, so it can be run regardless.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class ActiveRecord::Base
  # given a hash of attributes including the ID, look up the record by ID. 
  # If it does not exist, it is created with the rest of the options. 
  # If it exists, it is updated with the given options. 
  #
  # Raises an exception if the record is invalid to ensure seed data is loaded correctly.
  # 
  # Returns the record.
  def self.create_or_update(options = {})
    id = options.delete(:id)
    record = find_by_id(id) || new
    record.id = id
    record.attributes = options
    record.save!
    
    record
  end
end

You can use it like this (in db/fixtures/venues.rb):

1
2
Venue.create_or_update(:id => 1, :name => "Coffman Union")
Venue.create_or_update(:id => 2, :name => "Alumni Center")

If you need to change the data, just edit the file:

1
2
3
Venue.create_or_update(:id => 1, :name => "Coffman Union")
Venue.create_or_update(:id => 2, :name => "McNamara Alumni Center")
Venue.create_or_update(:id => 3, :name => "Lind Hall")

I like this approach. The data is validated by ActiveRecord. It’s easy to update, and you can add it to your deploy recipe to make it automatic.

Loading lots and lots of data

I’ve read both fixtures and ActiveRecord data loaders are too slow if you have lots of data (See Tonkatsufan’s comment here). In that case, the best thing to do is use your database’s preferred method of batch loading SQL inserts.

Your method here?

So that’s my survey of the available methods of loading seed data. I’m interested to hear what other people out there are doing. How do you do it?

Managing Migrations

Posted by Luke Francl
on Monday, January 14

This is part of a series in which I am exploring the best practices for working together on a Rails team in advance of my acts_as_conference presentation. Earlier, I wrote about working together effectively and argued that bigger projects require better tools.

Migrations are great, but they are not without problems. Here’s some I’ve run into and how to fix or avoid them.

Migration Decay

You’ve been assigned to fix some bugs in a project your company built a while ago. You update the code and then run rake db:migrate.

rake aborted! (See full trace by running task with --trace)

D’oh.

I can this seemingly inevitable breakdown “migration decay.”

This goes against the general principles we’ve been talking about. It’s not automatic. It adds barriers when you add new members to a project, stressing the communications channels.

Note that this schema.rb definition is the authoritative source for your database schema. If you need to create the application database on another system, you should be using db:schema:load, not running all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations you’ll amass, the slower it’ll run and the greater likelihood for issues).

db/schema.rb

Idealisticly, I’m for using migrations and keeping them running from version 0 to version n. Migrations are cool because they can do it all: change your schema, load your seed data, and modify data on the server.

But as your models change, the old migrations don’t. And so it’s very likely that you’ll end up with migrations that stop working.

With enough work, you can probably keep them running—most of the time. But I’ve just been in too many situations where the migrations have failed. Worse is if they fail half-way in, because the migration number’s not incremented, but half of it has been applied. So the next run will fail, too, and then you have to fix the database manually. Bleh.

This has lead me to re-evaluate the use of schema.rb. Since schema.rb can’t load data, that means you need to separate migrations and data loading.

Re-organizing migrations

I’ve worked on a couple of projects where the existing migrations were consolidated into a single migration at the end of the development cycle.

This seems like too much trouble to me. Especially if we are giving up on migrations as the way to create the database. Who cares how many there are in that case? Plus you have to screw around with the server to re-set the schema_info table.

rake db:reset

On my latest project, I have been editing my migrations as needed and then re-creating the database with rake db:reset db:migrate.

I can only do this because it’s a small project and we haven’t deployed yet. But it does keep the migrations much cleaner and easier to understand.

Conflicting migrations

Anyone ever have this happen?

Chances are if you’ve worked on a project with more than a couple developers, you’re going to get conflicting migrations checked into your source control. And the more developers you have, the more time your whole team will waste cleaning up afterwards. Branches present another problem for migrations.

The root problem is that migrations layer another level of version control on top of your SCM. For a really insightful look at this thorny problem, I recommend reading the Django project’s Schema Evolution wiki page, particularly this part. (No, they haven’t solved it either.)

Some potential solutions:

  1. Use a SCM hook to prevent checking in conflicting migrations. There is a pre-commit hook that does this for Subversion. This wouldn’t work for branches, though.
  2. Use a plugin that extends migrations so they aren’t based (exclusively) on numbers. ELC Technologies has a plugin called Duplicate Migrations that allows this. There have also been a few other attempts, most notably Enhanced Migrations from Revolution Health and Independent Migrations by Courtenay.
  3. Write all the changes directly in schema.rb and let Auto-migrations take care of the database while your SCM keeps track of the merging.

I find the last option attractive, but I’m not sure I’d trust it for production use.

I really like the Duplicate Migrations plugin. Not only can developers concurrently change the database without problems, but development could continue on a maintenance branch without causing problems for the mainline. I plan to use it on my next big project.

I’ve also used the SCM hook approach with great success (within a single branch of development). There, the person who checks in their code “last” has to fix the problem, so it costs them some time. But it doesn’t kill everyone else’s time, too.

Your $0.02 here…

So, how do you deal with migrations? Let me know in the comments. Thanks!

Bigger projects require better tools

Posted by Luke Francl
on Sunday, January 13

We’ve seen how having more developers on a project increases the number of communication channels dramatically.

Project size also has a direct impact. To quote Code Complete:

Project size is easily the most significant determinant of effort, cost and schedule [for a software project]. People naturally assume that a system that is 10 times as large as another system will require something like 10 times as much effort to build. But the effort for a 1,000,000 LOC system is more than 10 times as large as the effort for a 100,000 LOC system.

For any non-trivial program, it is impossible to keep the whole thing in your head at once, no matter how smart you are. So in order to build software, we have to constantly battle against complexity. We split things into pieces; we automate processes so we don’t have to think about them; we document so we can remember later or delegate to others.

When it’s just me working on a project, I can track bugs on a piece of paper; there are never any problems with merges or conflicting migrations; and I know exactly who broke the test cases. As the number of developers increases, I need better tools. I need a real bug tracker, continuous integration to run the tests, and a way to deal with conflicting database changes.

What is the bearing of this on Rails? My goal is to look at how we can smooth over the bumpy edges of building a Rails project. These are the things that trip you up, that break your flow.

  • Managing migrations
  • Loading your initial data
  • Managing third-party code
  • Perhaps more…

Finally, I’ll try to answer the question: does it even matter? That is, will you get enough of a productivity boost to make these techniques worth it?

Working with others (effectively)

Posted by Luke Francl
on Monday, January 07

This weekend, I’ve been working on my presentation for acts_as_conference. I’m talking about best practices for working on a Rails team. I’m not approaching this as if I were the final authority on how to program Rails; instead, I’ll be offering up some examples of problems I’ve seen and ways of dealing with them.

At a higher level, I want to talk about working together on an effective software team. In order to do that, I’m re-reading some of my favorite books about programming: Code Complete, The Mythical Man-Month, Patterns of Software (download), Peopleware and others.

This research has made me somewhat re-think my thesis, which focuses on what we can do to improve Rails teams’ productivity. What I’ve realized is that much of this is out of the programmers’ hands; and also that “productivity” is something of a biased term. In “Productivity: Is there a Silver Bullet?” Richard Gabriel writes:

The word productivity itself is somewhat loaded. Do programmers working at some company wish they were more “productive”? Probably not. Such programmers wish they were more effective, that their tools were better, that there were fewer barriers to getting work done, that management was more competent, that working conditions were better, that there was some direct link between the success of that company and the efforts of the programmers, and that it wasn’t a struggle for the company to survive from quarter to quarter. Productive is a term used describe what an employee is or isn’t.

The essay goes on to argue that to a large degree, developer productivity is not based on technology, but the programmers’ environment.

As a programmer, I don’t have a lot of control over that, but we have to acknowledge its impact. Since I’m a programmer, not a manager, my talk will focus on what we, as programmers, can do to work better together.

It gets harder to work on a team the bigger it gets. Look at the exponential quadratic growth of communication channels between developers as a team grows.

By the time we have 6 developers working on a project, there are 15 communication channels and things are starting to get hard to manage (let alone draw in OmniGraffle). And this is a pretty small team!

None of this is news. But what does Rails bring to the picture? By removing a lot of the busy-work associated with older frameworks, it lets fewer programmers get more done, which helps our communications problem. But Rails teams have to overcome some hurtles of their own…

Stay tuned, I’ll be posting more on this subject this week, and I hope some of you will be able to join me in Orlando to discuss this!

Speaking gig: acts_as_conference

Posted by Luke Francl
on Friday, November 16

I’m proud to have been selected as one of the nine speakers for acts_as_conference happening February 8 and 9 in Orlando, Florida.

speaker @ acts_as_conference

The speakers look great, and it’s a real honor to have been chosen. Obie Fernandez and Dan Benjamin will be keynoting. Along with myself, the speakers will be Peter Armstrong, Anthony Eden, Neal Ford, Bryan Liles, felow RUMie Charles Nutter, Josh Owens, and Charles Brian Quinn. There will also be a charity session with Ezra Zygmuntowicz and Evan Phoenix discussing Merb and Rubinius.

My topic will be Working with others: Best Practices for Rails Teams. I’ll explore this over the next few months here on the blog as I flesh out my ideas. I’d like to hear from people on Rails teams to see how they manage the communications challenges of software development.

I hope to see you there! The conference is a very reasonable $100, and I for one can’t wait to get out of Minnesota for a few days this winter.