Measuring your test coverage with Heckle and RCov

Posted by Jon
on Thursday, November 29

I gave a presentation at RUM on Monday about code metrics. In particular, I showed tools for measuring two aspects of code: test coverage and complexity. Here are my slides.

Saikuro and Flog measure code complexity. Saikuro measures cyclomatic complexity, the number of independent paths through a method. Flog, on the other hand, parses your code and assigns a complexity value to assignments, branches, and calls. The goal, of course, is to minimize code complexity. This is an important goal, but I’m not sure yet what I think of these measurement tools. I haven’t used them enough to know if they have practical value.

Heckle and RCov on the other hand, are useful. I’m going to look at each in more detail here.

RCov

RCov measures C0 code coverage. That is, it runs your test suite, and looks at what lines of your application were run or not run. It then gives you a nice HTML report with red and green lines – red for lines of code that are not run, and green for lines that are run.

If your test suite doesn’t execute a line of your application code, it is safe to say that that line is not tested. On the other hand, if a line of your application is run, it is NOT safe to say that it IS tested. A test method with no asserts works just fine for RCov’s purposes, thank you very much. Take a look at this code.

def test_user_assignment
  User.assign
end

This test is enough to mark the User.assign method as tested. But nothing is asserted, and so nothing is tested. The problem is equally true even if you aren’t in the habit of writing tests without assertions; you may make assertions about some aspects of a method, but forget about other aspects. And RCov won’t tell you this.

Logically speaking, RCov tells you that if line_is_red, then !line_is_tested. From this, you can also infer the contrapositive: if line_is_tested, then !line_is_red. But that’s all you know. If a line is green, RCov tells you nothing at all. Saying if !line_is_red, then line_is_tested is a formal fallacy (denying the antecedent). And that’s bad.

So 100% RCov coverage is not equal to 100% test coverage. In fact, the two have nothing to do with each other. Your code could have 100% or 95% or 75% RCov coverage, and be extremely poorly tested.

In my experience, RCov is a one-time tool. That’s because green lines in RCov don’t tell you anything at all about your test coverage. Red lines provide the real value. If you run RCov, find an untested method, and write up a quick test hack that provides C0 coverage, RCov will never complain about that method again. It will be off your RCov radar. This is too bad, because it is really useful to know what is poorly tested. So whenever you see red in RCov, take the time to write comprehensive tests to cover the untested code.

Heckle

Heckle is a mutation tester that changes your code and checks to see whether your tests catch the changes. If Heckle is able to change instances of true to false (or 32 to nil, or remove method calls) in your application without creating a test failure, then your code isn’t tested well enough. To run it effectively, do this:

heckle Class method -t /test/units/class_test.rb -T 30

heckle is the tool, installed as a Ruby gem. Class is the name of the Ruby class you want to heckle. method is a method on the class; you can leave this out, but I don’t recommend it. -t /test/units/class_test.rb is the path to the unit test you want to use (also optional). Finally, -T 30 specifies a timeout for the test, in case your mutation creates an infinite loop.

You can leave out the last three options and just run Heckle with a class:

heckle Class

But I don’t recommend it.

First, it will take forever.

Second, you may run into infinite loops.

Third, heckle will unfortunately test EVERY method available to a class, including methods included by modules, superclasses, etc. So if you’re heckling an ActiveRecord class, you’re going to see dozens of Rails magic methods, not just the methods that you wrote.

Fourth, your UserTest should cover your User class on its own, if your code is well written and well tested; it shouldn’t rely on the ProductTest class (or another test). One problem with Heckle is that it doesn’t distinguish between well tested code and highly coupled code, where a small change somewhere causes the application to fall apart somewhere else. This problem can be minimized by only comparing a single method to a single test class.

I like Heckle and find it pretty useful. Unfortunately, it needs a little developer love. The -T timeout parameter is flaky; it doesn’t always play nice with its dependencies (especially ParseTree 2.0.x, the current version); and it would be more useful if by default it only heckled the methods directly added by a class, not methods brought in through parent classes, includes, or fancy metaprogramming. This is a shame, because it is really a great tool. Hopefully Kevin Clark and Ryan Davis have an update in the works.