Escaping the Oubliette (Part 4) – The Litter Patrol

Reading time ~ 2 minutes

As promised in my last installment on oubliettes…

Your team might not be fully ready for the merciless refactoring encouraged by some agile approaches but this will help you stay heading in the right direction whilst keeping the delivery vs refactoring impacts balanced out.

The cost of change has a (roughly) logarithmic relationship to debt. I’ve seen first-hand how high-debt systems become almost impossible to change and it’s not pretty.

In a debt-ridden system we are eventually faced with a choice; refactor or replace. Eventually even once-newly-replaced systems build up debt and the refactor/replace choice returns. Craig Larman & Bas Vodde’s most recent book covers the debt/cost relationship brilliantly in the section on “legacy code”. They also describe the oubliette strategy of “Do No Harm” or as I call it – The Litter Patrol“.

This is a particularly powerful debt management approach as it’s both a prevention and reduction strategy.

Here’s the basic concept…

When working with an area of legacy code, you’re working in a particular “neighbourhood“. If that neighbourhood is untidy your care and attention to that neighbourhood is diminished. Much like the “broken windows” principle; once the damage seal is broken, neglect and further damage follow and overall code quality deteriorates rapidly.

So (without going overboard), every time you’re working in a particular neighbourhood, what can you do to clean up a few pieces of litter?

Not the run-down shopping district between lines 904 and 2897 but more the abandoned classic car between lines 857 and 892 or the overflowing trashcan on the corner between 780 and 804.

If you introduce a litter patrol in your teams and encourage a hygiene cycle with every change to your code, your debt load and future cost of change will rapidly reduce for the areas you hit most frequently.

Unfortunately although this is easier than a complete refactoring of a poorly designed class-hierarchy or monolithic god-class, in order to perform the litter patrol in safety on your code you need good unit tests or small functional tests for that neighbourhood and ideally some refactoring support (I like to call these your gloves, garbage bag and litter picker).

This challenge doesn’t mean don’t do it. In fact if you don’t have tests already, maybe your next patrol isn’t changing the code at all but to write just a couple of small, independent tests to demonstrate how that area is expected to behave. (You might even need to make some tweaks to make it testable)

If it’s hard, don’t avoid the problem, focus on making life easier one bite or constraint at a time.

Every time we successfully deliver a small clean-up task, the future cost of change to that area is reduced and our incentive to keep it clean is improved.

Look out for part 5 – sponsorship – coming soon.

Escaping the Oubliette (Part 2) – Tailing

Reading time ~ 4 minutes

Following my previous articles on topping and debt prevention, we’ll now focus on the easier parts. How to clear old defects and debt. we’ll cover 4 simple strategies; 2 for defects and 2 for debt. (there are paired similarities across these).

For defects, we’ll focus on:

For debt we’ll focus on:

Unlike prevention and stop the bleeding activities which often require significant effort and commitment to accomplish, defect and debt removal have some simple and relatively low effort options. There are some high-effort/impact alternatives that we’ll cover later.

Today we’ll look at…

Tailing & Ratcheting

When you have a build up of defects over time you tend to have a “tail” of really old defects and issues. These are usually low or medium severity/priority items (with the occasional blip) and are often in functionally gray areas requiring difficult decisions or significant rework.

They age because we don’t want to touch them and would rather forget about them (the oubliette again). A typical defect age distribution looks something like this:

right-skewed beta distribution

right-skewed beta distribution

It’s one of the most common statistical patterns I see in software development (and I’ll return to it in future) but for the purposes of this article, I want to look at the “tail” of this curve- that last 5-10% of your defect population – all your oldest items.

Addressing the tail is pretty straightforward. You can either set yourself a target “maximum defect age by a given date” or simply focus on continuous improvement. Whilst the target approach gives you a clear goal, you risk setting yourself up for failure or under-commitment. Defects are considered notoriously hard to size (I’ll cover this myth in future) but chances are if they’re old they’re probably a bit tricky too so being predictable about dates is something you might prefer to avoid for starters.

Whether you aim for a specific target or improving every day/week/month you’ll need the same stepped approach.

Identify your oldest defect, pick it off the end of the queue and commit to closing it quickly and fairly.

Here’s the first thing to accept… Closing doesn’t mean fix it in every case. Because you’ve not touched it for a while, chances are someone is expecting a solution so you’ll be looking at a difficult conversation if you don’t fix it but that might be far easier than “fixing” something that shouldn’t be changed or will derail your team and product. Make open, fair, honest decisions in each case.

  • If it is something you think you should be fixing, get on with it.
  • If it’s not – close it

Sound familiar?

Aim to close at least one of your oldest defects every week or every sprint.

If you have a lot of items that are considered old, consider increasing your capacity on these in the short term to get the ball rolling. (at the cost of other delivery activities)

If we look back to your distribution of defects over time, when you do close out your oldest defects, put a ratchet mechanism in place that sets a continuously reducing maximum age.  In very bad weeks the ratchet may not improve but don’t let the numbers get worse again or the effort and good faith from your teams and management will have been wasted.

With a ratchet, remember each “notch” is a step change from the last point. For example if your last point was 1000 days(!), clearing everything older than that should leave some defects nearly 1,000 days old. Set the next ratchet point to 950 days (rather than 997), determine what falls in that next block (950-999) and fix those. Ensure each ratchet point is a larger time window than your execution period otherwise you’ll end up stationary. (I usually go for a ratchet up of 50 days improvement per week or sprint).

Here’s what I mean…

Defects Tail

Ratcheting out the oldest defects each week

3 months of this tailing practice every week will dramatically reduce the average and maximum age of issues in your queue. It’ll make you all feel better and it usually helps to draw the heat off on escalations for a while. (Bear in mind you should be doing this in addition to your “topping” approaches).

Occasionally you’ll reopen some old wounds this way but paying these some attention now stops them coming back up when the timing isn’t under your control.

Keep this up for six months or a year and you’ll reach a stable point where nothing gets too old and your team becomes adept at having difficult conversations with your customers early and backs them up with some successes as well.

Eventually in order to keep the ratcheting improving you’ll have to commit more people. This probably means you’re approaching your natural level of control or entitlement where you have aging defects levelling off and others being addressed in a reasonably consistent manner. It’s up to you whether you aggressively pursue the numbers down further or sustain them at this level.

In case this sounds too obvious or simple to work, I’m working with a team right now that introduced this ratcheting approach less than a month ago. We’ve reduced our oldest defect age by over 25% and customer escalations are consistently lower already. Perhaps most important to the team; we’re no longer at the top of the charts when our leaders review the defect stats. Although harsh-sounding. Sometimes, having someone else drawing the heat for a while lets us get back to focusing on value and priorities.

In part 3 we’ll look at an old favourite – the “bug blitz” or feel free to skip to part 4 – “the litter patrol“.