By Miško Hevery
A lot of people have been asking me lately, what is the cost of testing, so I decided, that I will try to measure it, to dispel the myth that testing takes twice as long.
For the last two weeks I have been keeping track of the amount of time I spent writing tests versus the time writing production code. The number surprised even me, but after I thought about it, it makes a lot of sense. The magic number is about 10% of time spent on writing tests. Now before, you think I am nuts, let me back it up with some real numbers from a personal project I have been working on.
Total Production Test Ratio
Commits 1,347 1,347 1,347
LOC 14,709 8,711 5,988 40.78%
JavaScript LOC 10,077 6,819 3,258 32.33%
Ruby LOC 4,632 1,892 2,740 59.15%
Lines/Commit 10.92 6.47 4.45 40.78%
Hours(estimate) 1,200 1,080 120 10.00%
Hours/Commit 0.89 0.80 0.09
Mins/Commit 53 48 5
Commits refers to the number of commits I have made to the repository. LOC is lines of code which is broken down by language. The ratio shows the typical breakdown between the production and test code when you test drive and it is about half, give or take a language. It is interesting to note that on average I commit about 11 lines out of which 6.5 are production and 4.5 are test. Now, keep in mind this is average, a lot of commits are large where you add a lot of code, but then there are a lot of commits where you are tweaking stuff, so the average is quite low.
The number of hours spent on the project is my best estimate, as I have not kept track of these numbers. Also, the 10% breakdown comes from keeping track of my coding habits for the last two weeks of coding. But, these are my best guesses.
Now when I test drive, I start with writing a test which usually takes me few minutes (about 5 minutes) to write. The test represents my scenario. I then start implementing the code to make the scenario pass, and the implementation usually takes me a lot longer (about 50 minutes). The ratio is highly asymmetrical! Why does it take me so much less time to write the scenario than it does to write the implementation given that they are about the same length? Well look at a typical test and implementation:
Here is a typical test for a feature:
ArrayTest.prototype.testFilter = function() {
var items = ["MIsKO", {name:"john"}, ["mary"], 1234];
assertEquals(4, items.filter("").length);
assertEquals(4, items.filter(undefined).length);
assertEquals(1, items.filter('iSk').length);
assertEquals("MIsKO", items.filter('isk')[0]);
assertEquals(1, items.filter('ohn').length);
assertEquals(items[1], items.filter('ohn')[0]);
assertEquals(1, items.filter('ar').length);
assertEquals(items[2], items.filter('ar')[0]);
assertEquals(1, items.filter('34').length);
assertEquals(1234, items.filter('34')[0]);
assertEquals(0, items.filter("I don't exist").length);
};
ArrayTest.prototype.testShouldNotFilterOnSystemData = function() {
assertEquals("", "".charAt(0)); // assumption
var items = [{$name:"misko"}];
assertEquals(0, items.filter("misko").length);
};
ArrayTest.prototype.testFilterOnSpecificProperty = function() {
var items = [{ignore:"a", name:"a"}, {ignore:"a", name:"abc"}];
assertEquals(2, items.filter({}).length);
assertEquals(2, items.filter({name:'a'}).length);
assertEquals(1, items.filter({name:'b'}).length);
assertEquals("abc", items.filter({name:'b'})[0].name);
};
ArrayTest.prototype.testFilterOnFunction = function() {
var items = [{name:"a"}, {name:"abc", done:true}];
assertEquals(1, items.filter(function(i){return i.done;}).length);
};
ArrayTest.prototype.testFilterIsAndFunction = function() {
var items = [{first:"misko", last:"hevery"},
{first:"mike", last:"smith"}];
assertEquals(2, items.filter({first:'', last:''}).length);
assertEquals(1, items.filter({first:'', last:'hevery'}).length);
assertEquals(0, items.filter({first:'mike', last:'hevery'}).length);
assertEquals(1, items.filter({first:'misko', last:'hevery'}).length);
assertEquals(items[0], items.filter({first:'misko', last:'hevery'})[0]);
};
ArrayTest.prototype.testFilterNot = function() {
var items = ["misko", "mike"];
assertEquals(1, items.filter('!isk').length);
assertEquals(items[1], items.filter('!isk')[0]);
};
Now here is code which implements this scenario tests above:
Array.prototype.filter = function(expression) {
var predicates = [];
predicates.check = function(value) {
for (var j = 0; j < predicates.length; j++) {
if(!predicates[j](value)) {
return false;
}
}
return true;
};
var getter = Scope.getter;
var search = function(obj, text){
if (text.charAt(0) === '!') {
return !search(obj, text.substr(1));
}
switch (typeof obj) {
case "bolean":
case "number":
case "string":
return ('' + obj).toLowerCase().indexOf(text) > -1;
case "object":
for ( var objKey in obj) {
if (objKey.charAt(0) !== '$' && search(obj[objKey], text)) {
return true;
}
}
return false;
case "array":
for ( var i = 0; i < obj.length; i++) {
if (search(obj[i], text)) {
return true;
}
}
return false;
default:
return false;
}
};
switch (typeof expression) {
case "bolean":
case "number":
case "string":
expression = {$:expression};
case "object":
for (var key in expression) {
if (key == '$') {
(function(){
var text = (''+expression[key]).toLowerCase();
if (!text) return;
predicates.push(function(value) {
return search(value, text);
});
})();
} else {
(function(){
var path = key;
var text = (''+expression[key]).toLowerCase();
if (!text) return;
predicates.push(function(value) {
return search(getter(value, path), text);
});
})();
}
}
break;
case "function":
predicates.push(expression);
break;
default:
return this;
}
var filtered = [];
for ( var j = 0; j < this.length; j++) {
var value = this[j];
if (predicates.check(value)) {
filtered.push(value);
}
}
return filtered;
};
Now, I think that if you look at these two chunks of code, it is easy to see that even though they are about the same length, one is much harder to write. The reason, why tests take so little time to write is that they are linear in nature. No loops, ifs or interdependencies with other tests. Production code is a different story, I have to create complex ifs, loops and have to make sure that the implementation works not just for one test, but all test. This is why it takes you so much longer to write production than test code. In this particular case, I remember rewriting this function three times, before I got it to work as expected. :-)
So a naive answer is that writing test carries a 10% tax. But, we pay taxes in order to get something in return. Here is what I get for 10% which pays me back:
When I implement a feature I don't have to start up the whole application and click several pages until I get to page to verify that a feature works. In this case it means that I don't have to refreshing the browser, waiting for it to load a dataset and then typing some test data and manually asserting that I got what I expected. This is immediate payback in time saved!
Regression is almost nil. Whenever you are adding new feature you are running the risk of breaking something other then what you are working on immediately (since you are not working on it you are not actively testing it). At least once a day I have a what the @#$% moment when a change suddenly breaks a test at the opposite end of the codebase which I did not expect, and I count my lucky stars. This is worth a lot of time spent when you discover that a feature you thought was working no longer is, and by this time you have forgotten how the feature is implemented.
Cognitive load is greatly reduced since I don't have to keep all of the assumptions about the software in my head, this makes it really easy to switch tasks or to come back to a task after a meeting, good night sleep or a weekend.
I can refactor the code at will, keeping it from becoming stagnant, and hard to understand. This is a huge problem on large projects, where the code works, but it is really ugly and everyone is afraid to touch it. This is worth money tomorrow to keep you going.
These benefits translate to real value today as well as tomorrow. I write tests, because the additional benefits I get more than offset the additional cost of 10%. Even if I don't include the long term benefits, the value I get from test today are well worth it. I am faster in developing code with test. How much, well that depends on the complexity of the code. The more complex the thing you are trying to build is (more ifs/loops/dependencies) the greater the benefit of tests are.
So now you understand my puzzled look when people ask me how much slower/costlier the development with tests is.
Impressive analysis! I think people could quibble with your hours estimate, but that's really just a bikeshed argument.
ReplyDeleteThose four bullets explaining the benefits of your test "tax" are probably the best justification for test-driven development that I have ever ready.
An independent or spare-time developer really benefits from the "tax" since its so much easier to get back into your code, start modifying locally, and not worry too much about breaking the rest of system. You typical MegaCorp can just throw money at the problem, so its masked, but for those running lean you can't do that. Also a MegaCorp that wants to make more profit would do well to heed this advice.
boolean not bolean, right?
ReplyDeleteNice post Misko :)
ReplyDeleteIs it possible that the cost of test code depends upon the nature of the production code? 10% for leaf functions like this one, more for components in the middle of the application that interact with other components and require you to write stubs (or set up complicated fixtures) in order to test?
ReplyDeleteGood post...
ReplyDeleteBut I am puzzled. What makes you to belive that writing units tests is all that is required for testing. Is that all you do for testing?
Shrini Kulkarni
Hi Misko,
ReplyDeleteMy name is Gil Zilberfeld, and with my colleague, Roy Osherove, we do a little video cast called "what's new in testing".
This week we mention your analysis, so I invite you to come. If you like what you see, please promote us.
Thanks,
Gil Zilberfeld
Typemock