Tuesday, March 25, 2008

Why do test antipatterns emerge?

In the previous post I presented an example of what I call the Soap Opera Test Antipattern, and some possible side-effects like having test code implicitly coupled to the application code. Reasons for this post arose from a discussion which is still going on in the Bologna XP mailing list, and reinforced by this post by Jason Gorman. Of course, every methodology works perfectly well …in theory. But practice with testing systems leaves us with a bunch of challenging issues when applied (more or less blindly) to real world situations.

So why do we end up having Soap Operas test in our code? I think one reason is rooted in the heart of the TDD mantra "Red, Green, Refactor". Here's why:

  1. Red. You want to add a new requirement, you do so by adding the corresponding test. You're done when you added the test, and running it results in a red bar.
  2. Green. You get to the green bar as quick as possible. Hacks are allowed, to get to green because being far from the green makes you dive too deep and you have no idea about what it takes to get back to green. You're done when you have the green bar again in your xUnit test suite.
  3. Refactor. This is a green-to-green transition that allows you to clean up the code, remove duplications, and make the code look better than in step 2.

Step 3 looks a little weaker than the others for a few reasons

  • It's the third step. If you're time-boxed, this is where you're gonna cut, by telling "done" to your boss, even if you feel that something's still missing.
  • The termination condition is less defined, compared to step 1 and 2; "green" is a lot less disputable than "clean". To declare step 3 over you have to satisfy your "personal definition of code beauty", assumed you have one. Moreover, refactoring goals are often personal: TDD book suggests to write them on a paper and keep it for the day. This means that you refactoring goals are not shared with the team. This is not a mandatory approach, for example I am the kind of guy that normally starts polluting the bug tracking system with refactoring suggestions. But I also know that very few of them will actually make it to production code (unless I am supremely in charge of the project…). Anyway, I think that most of the time refactoring notes are something too trivial to be shared on the Bug Tracking System. But the best way to achieve that is to have them fixed before they have to become remainders.
  • It's a matter of culture. If you're doing TDD but lack some crucial OOP skill, you're in danger of writing sloppy tests. There's a lot of good OO in a framework like JUnit, and designers made it good enough that the OO part is well hidden behind the scenes. But this does not mean that developers should code like neanderthalians when it comes to coding tests.

Putting it all together, the result is often test code which is less effective than it should be.



Friday, March 14, 2008

The soap opera test antipattern

If you are coming from a romantic programmer attitude, or simply didn't care about testing your code, then every single line of tests code is valuable and adds some stability to your system.

After a while, anyway, the testing code mass could increase significantly and become problematic if not correctly managed. I've pointed you to the Coplien vs Martin video in my previous post. Now I won't claim that I've found the solution of the issue, but some thoughts on the topic might be worth sharing.

Starting to test

When embracing TDD or test first, or – less ambitiously – when starting to use xUnit frameworks for testing, you simply have to start from somewhere. You choose the target class or component, define the test goal and code your test using assertions to check the result. If the light is green then the code is fine, if it's red… well, you have a problem. You solve the problem, refactor the solution to make it better, in a green-to-green transition, then move to the next feature, or the next test (which will be the same thing, if you are a TDD purist).

Every test adds stability and confidence to your code base, so it should be a good thing. Unfortunately, when the test code mass reaches a certain weight it starts making refactoring harder, because it looks like extra code to be affected in a refactoring process, making refactoring estimations more pessimistic, and the whole application less flexible.

Why does this happen? I suspect testing skills tend to be a little underestimated. JUnit examples are pretty simple, and some urban legends (like "JUnit is only for unit tests") are misleading. Testing somehow is a lot better that not testing at all. Put it all together in a large scale project and you're stuck.

The soap opera test antipattern

The most typical symptom of this situation is what I call the soap-opera test: a test that looks like an endless script.

@test
public void testSomething() {
// create object A

// do something with this A

// assert something about A

// do something else with A

// assert something about A

// create object B

// assert something about B

// do something with B

// assert something about B

// do something with B and A

// assert something about B and A

}

The main reason why I named this one "soap opera" is straightforward: there is no clear plot, there are many characters whose role is unclear, things are happening slowly, and conversations are filled with a lot of "do you really mean what you said?" and there is no defined end. The second reasons is that I always dreamed to name a pattern, or an antipattern… somehow.

Even if I was too lazy (or sensible) to put some real code in there, some issues are pretty evident:

  • Test looks like a long script;
  • if you're lucky, the purpose of the test is in the method name or in the javadoc, assertions are too many to make the test readable or to make out the purpose by simply reading the code;
  • I bet a beer that 90% of the lines you have on a test like this are simply cut&paste from another test in the same class (if this is the only test you have in your system the bet is not valid);
  • The test can get red for too many reasons;
  • Really looks like the inertial test code mass mentioned before.

What's the point in "looks like a long script"? My opinion is simply that it doesn't have to look like that! A good test has a well defined structure which is

  1. Set up
  2. Declare the expected results
  3. Exercise the unit under test
  4. Get the actual results
  5. Assert that the actual results match the expected results

I grabbed the list from here, the original article talks about many JUnit antipatterns (but calls the soap opera antipattern "the overly complex test" which is a lot less glamorous). Setting up can't be accomplished completely by the setUp() method, cause some preparation is obviously test-specific. Steps 3 and 4 often overlap especially if you're testing a function. But the whole point is that this definitely is a structure, while a script is something less formed.

Multiplying the asserts has a thrilling effect: when something goes wrong all of your test start getting red. In theory a test should test one and only one feature. There are obviously dependent features, but a well formed test suite will help you a lot in problem determination by pointing right to the root cause. If the testing code for a feature is duplicated all over the test suite… you just get a lot of red lights but no hint about where the problem is.

Testing against implicit interfaces

Even if you clean up your testing code and refactor to be in one feature/one test situation you'll still experience some inertia, due to testing code. This definitely smells: we were told that unit tests are supposed to help refactoring, allowing us to change the implementation while controlling behavior on the interface. The problem is that we are often doing it only in step 3 of the above list, while we are depending on application implicit interfaces in creation of test objects and sometimes also in asserting correctness of the result. Creating a test object might me a nontrivial process – especially if the application does not provide you with a standard way of doing it, like Factories or the like – and tends to be repeated all over the testing code. If you're depending on a convention, changing it will have probably a heavier impact.

In general, when writing a test, step 3 is very short. Basically just a line of code, depending on the interface you've chosen. Dependencies and coupling sneak in from test preparation and test verification, but you've got to keep it under control to avoid getting stuck by your test code base.


Wednesday, March 12, 2008

TDD vs Architecture debate

Some days ago, I was watching this video on InfoQ, where James Coplien and Robert C. Martin were discussing about some undesired side effects of TDD, particularly on the architecture side. One of the key point was that testing code increases the overall weight of the code base, making it harder to eventually refactor the architecture.

Another interesting issue presented was that TDD doesn't necessarily enforce testing all the possible boundary conditions, but often ends up in a sort of heuristic testing, which is less effective that testing based on a design-by-contract assumption.

Honestly, TDD book put a lot of emphasis on efforts to remove duplications, also between production and testing code, but I have the impression that this portion of the message is often lost by test writers. I've got some ruminations on the topic that will probably make up enough stuff for some more posts in the following days.



Friday, March 07, 2008

Social Networking Patterns

I've had some interesting reaction to my post on Social Networking, that I wrote basically to apologize for making people wasting their time. After concluding that social networking is probably some sophisticated IT warfare weapon developed to harm productivity of the western countries, I've had an interesting conversation with Giulio Cesare Solaroli, the mind behind the Clipperz online password manager, about the fact that as platforms are becoming more open, intercepting users behavioral pattern is a key concern for any social web application.

I am not quite sure if the notion of pattern fits exactly the situation, but I blogged about it before, and then found site WikiPatterns which published a consistent catalog of behavioral patterns, that reflect themselves in the shape if the information. There are more than 50 patterns and antipatterns just for a Wiki, in a scenario with some evident boundaries, like

  • People are told to go to a wiki
  • The people working on a wiki are already some kind of group (development team, company, etc.)
  • They should share a common goal

A social networking tool such as LinkedIn, Naymz or Spock, has a similar high-level goal which is provide some form of valuable knowledge as a result of individual contributions by the user, but is far more open. Nobody asks you to go on a platform (well, … somebody invites you…), you're not necessarily part of the same group, and there is no such thing as "the common goal". I've asked myself "why do I keep my LinkedIn page updated?", and here are the answers.

  1. I like learning how a new tool works
  2. It's useful for my marketing as a freelance
  3. It's useful for my job, cause Web 2.0 and the like are part of my consulting portfolio
  4. I can't stand fake or incomplete information
  5. I hate writing CVs and LinkedIn looks like the right place to write information only once
  6. Vanity

There are probably some more reasons, but here we are talking only about the relationship between me and the tool. For some of my friends reasons are completely different, and some other are not on linked in and they're not interested to move in. But the tool is a networking platform, and this means that a lot more variables and scenarios are possible. I'll drop down something.

  1. What if somebody wants to connect with you and you don't know him?
  2. What if somebody wants to connect with you and you don't remember him?
  3. What if a friend connects with you but not in the right position?
  4. What if a friend endorses you for the wrong position?
  5. What if somebody asks for an endorsement?
  6. What if somebody endorses you, but you have no direct experience about the way he/she works?

Ok, one can develop some sort of "SocialNetiquette", but thinking about it is some sort of undesired side effect (it wastes brain cycles). But the key point, at least for me, is that I couldn't make up a consistent behavior. In other words, I don't give the same answer to the same question – after all, I am a consultant, so "It depends" is my mantra… As a result, some of my connection are strong, related to people I know well, that I've worked with and so on, but some are not. Are we abusing the tool? Or we're still using the tool the way it was intended. Or… does this question actually make sense?

A key argument about all Web 2.0 technologies is that providing strict rules about the way a tool is used is a losing approach. Tools should instead "follow" users needs and ideas and transform themselves into something that wasn't exactly planned in the beginning. It's sort of seeding something and then taking care of what's growing. More realistically, Linkedin can't ban users because they connected without knowing each other well enough (would you like to be interviewed by linkedIn police about your connections?), so its body of knowledge is made up of contributors which are not providing a consistent behavior (as individuals and as a crowd), which are posting incomplete and sometimes wrong information. Yet it works.

I still have the feeling of being part of a big experiment, but according to the Hitchhikers' guide to the galaxy, this does not necessarily mean that I am stupid.