Transactions for Managing Technical Debt

Reading time ~ 4 minutes

If you’re losing capacity maintaining brittle code or infrastructure, the “litter patrol” in a related neighborhood may be good practice but how do you visualize and manage the true source of your pain?

This is based on a mix of my own experiences and great nugget of insight picked up from Jim Highsmith at Agile 2010 who in turn credited Israel Gatt.

First the paraphrased nugget from Jim…

“Teams need both debt reduction and debt prevention strategies.”

Here’s the quote from Israel back in December ’09.

“If your company relentlessly pursues growth, the quality/technical debt liability it is likely to incur could easily outweigh the benefits of growth. Consider the upside potential of growth vis-a-vis the downside of the resultant technical debt. When appropriate, monetize technical debt using the technique described in Technical Debt on Your Balance Sheet.”

Here’s my expansion to Israel’s article…

How do we know we have debt?

We “know” it’s there, we feel it. But do the right people know about it? You may be allocating permanent team capacity to keep bailing without seeing the hole.

  • Train all your teams in spotting technical debt
  • Develop and communicate your debt strategy to your teams and stakeholders.
  • Determine a common unit of currency (points, hours, money, NUTs) that your stakeholders can understand and engage with and use this to describe the transaction decisions you’re making.

Communicate current debt – “Balance”

  • This is obvious but hard to achieve. For the debt that hurts; quantify the cost and determine what you could do if it were resolved. Get that opportunity cost shared and get your solution sponsored. In most large companies if you can demonstrate a cost reduction or productivity improvement you can get support.

Communicate new debt – “Recent Transactions”

  • Not the decade-old pile in the corner! Focus on what you’re forced to add because of deadlines and market pressure. Every time you make a debt-related compromise, get that conversation exposed. Determine the maximum acceptable age for any debt being added, a cost (in your selected currency) and a priority. Whilst there are cases where a debt may never need to be repaid, chances are you’ll need to pay the next transaction off.

Limit debt – Set yourself a “Credit Limit”

  • What’s the maximum debt your team/project/product is willing and/or able to tolerate? Set a credit limit and stick to it. If we blow our limit, we’re in trouble. My personal rule-of-thumb is if it’ll take more than one whole sprint for the entire team to clear all new debt then you’re worryingly-leveraged and heading away from releasable software. This approach means a simple default credit limit for a new project (using story points as currency) is the same as your velocity – simple to calculate and remember. You could convert this into the average hours (and therefore financial impact) a story of that size takes if that means more to your leaders.

Whatever you do, don’t over-mortgage your product. You don’t want your product portfolio to collapse under unrecoverable bad debt.

Set a maximum age threshold – “Payment Terms”

  • Use topping & tailing approaches to control and ratchet your debt age to keep the span down.  If your debt exists for more than N sprints (say 3) promote it to the top of the priority stack. Fixing new debt when it arrives is hard work and items do slip through. For those that escape, setting a maximum age means you’ll cycle all your debt through and keep it fresh. Even tough problems get resolved rather than rotting in the debt heap.

Visualize and report your debt – provide a “Debt Statement”

  • Put all debt visibly on your wall as cards or stickies, give it a different color, heading, whatever – make it visible. Now every sprint let’s highlight the numbers, the growth, reduction, reiterate your limit. If you’re using an electronic tool, try putting all the debt items under a single parent and tracking cumulative flow and burn-up of debt rolled up to that item.

Prioritize your debt – “Positions”

This is getting a little more advanced and difficult to manage – I’d reserve this for only projects that already have high debt where the simple strategies alone are too invasive.

  • Partition your debt into 3 positions. New:Short-term, New:Long-term, Old:All.

Set a different credit limit and payment terms for each of these and as part of prioritization set a ratio of effort/pay-off against each that ensures that at a minimum debt is sustained at a constant level but focus on cycling through so that the contents remain fresh.

Taking Action

Here are a few other points to explore – mostly around debt reduction.

• What “small wins” exist for you? Are there some simple debts than can be cleared quickly?

• If you sacrificed a team member for 1 sprint to pay off some short-term debt, would you be able to increase the overall team performance in later sprints?

• If a particular single debt item is too big for a team to swallow in a single sprint, can you call in or implement a parallel “SWAT Team” to fix it?

• What capacity do you need to reserve to sustain your current debt level?

• If you were going to reduce your debt burden by 10-20% what effort/capacity would that require and how long would it take?

• Can you demonstrate a return on investment for a particular piece of debt-reduction?

• Worst-case scenario: Could some debt control run “under the radar”? (And if this were discovered, what problems would be caused?)

Wrapping up:

Develop both debt prevention and reduction strategies for your teams, don’t just focus on reduction.

Treat technical debt like real personal or business debt: use “credit limits”, “statements”, “balance”, “transactions” & “payment terms” to your advantage.

To achieve effective reduction, work through the Oubliette strategies and in worst cases, consider a “SWAT Team”.

Escaping the Oubliette (Part 4) – The Litter Patrol

Reading time ~ 2 minutes

As promised in my last installment on oubliettes…

Your team might not be fully ready for the merciless refactoring encouraged by some agile approaches but this will help you stay heading in the right direction whilst keeping the delivery vs refactoring impacts balanced out.

The cost of change has a (roughly) logarithmic relationship to debt. I’ve seen first-hand how high-debt systems become almost impossible to change and it’s not pretty.

In a debt-ridden system we are eventually faced with a choice; refactor or replace. Eventually even once-newly-replaced systems build up debt and the refactor/replace choice returns. Craig Larman & Bas Vodde’s most recent book covers the debt/cost relationship brilliantly in the section on “legacy code”. They also describe the oubliette strategy of “Do No Harm” or as I call it – The Litter Patrol“.

This is a particularly powerful debt management approach as it’s both a prevention and reduction strategy.

Here’s the basic concept…

When working with an area of legacy code, you’re working in a particular “neighbourhood“. If that neighbourhood is untidy your care and attention to that neighbourhood is diminished. Much like the “broken windows” principle; once the damage seal is broken, neglect and further damage follow and overall code quality deteriorates rapidly.

So (without going overboard), every time you’re working in a particular neighbourhood, what can you do to clean up a few pieces of litter?

Not the run-down shopping district between lines 904 and 2897 but more the abandoned classic car between lines 857 and 892 or the overflowing trashcan on the corner between 780 and 804.

If you introduce a litter patrol in your teams and encourage a hygiene cycle with every change to your code, your debt load and future cost of change will rapidly reduce for the areas you hit most frequently.

Unfortunately although this is easier than a complete refactoring of a poorly designed class-hierarchy or monolithic god-class, in order to perform the litter patrol in safety on your code you need good unit tests or small functional tests for that neighbourhood and ideally some refactoring support (I like to call these your gloves, garbage bag and litter picker).

This challenge doesn’t mean don’t do it. In fact if you don’t have tests already, maybe your next patrol isn’t changing the code at all but to write just a couple of small, independent tests to demonstrate how that area is expected to behave. (You might even need to make some tweaks to make it testable)

If it’s hard, don’t avoid the problem, focus on making life easier one bite or constraint at a time.

Every time we successfully deliver a small clean-up task, the future cost of change to that area is reduced and our incentive to keep it clean is improved.

Look out for part 5 – sponsorship – coming soon.

Escaping the Oubliette (Part 3) – Bug Blitz

Reading time ~ 3 minutes

Every product team I’ve ever worked in had a bug blitz at some point and often one every year or two.

There’s no arguing that a decent bug blitz is a powerful way of getting the numbers down and clearing all the bugs in a good, solid drive feels good but the necessity for them is caused by a buildup from somewhere.

If you find you’re needing a bug blitz on every release, take a look at your defect and debt prevention activities and make sure you’ve got some topping & tailing practices going on.

Usually bug blitzes are performed at the end of a project but if you’ve not done so before, try having a 4-8 week blitz at the beginning instead. You’ll run quicker afterward and (if you keep things under control), you won’t have to worry about having time to mop up at the end.

Once you’ve got the numbers down, set your maximum defect threshold (or ratchet) at this level for the remainder of the project and keep this new lower level sustained throughout development.

What’s good about a bug blitz?

A blitz is particularly useful if you can focus on areas of the product you’ll be working with soon. It’ll get your team working together (particularly if they’re newly formed) and familiar with these areas before all the major work starts.

Couple this up with developing some decent automated tests in those areas as part of the defect fixing and you’ll be developing a much safer scaffold for your new work and reduce regression risks during your next release.

You could take things further and perform some refactoring but I suggest keeping different types of activities separate at this point and just stick to straight bugs. If you have a specific functional area needing a real overhaul, it doesn’t fit the bug blitz mold. I’ll cover this aspect  in more depth when I talk about “sponsorship” for debt reduction.

I use bug blitz approaches when training up new staff. I review the defect backlog for a particularly grubby functional area and have a pair of staff take custody of it as caretakers. We pipeline the defects so that they can start with some simple introductory ones (usually low severity, noise or cosmetic stuff) and once we get confident in these we expand out into adjacent areas – it usually takes a month or two for them to really get warmed up.

The bug blitz is also a great opportunity to start new release development with a clean slate. It means no mixing types of work during feature development and gives you an opportunity to scrub the grime out and get a few new scaffolding tests in place.

What about the down-side?

First; they cost time and money. If you dedicate a team (or most of a team) to a blitz you’re not delivering anything new. This is obvious but important. What’s the impact of a month’s delay to your next release? (assuming you can ship with those bugs in there). And what will that delay do to your stakeholder relationships?

Be mindful of the impact a bug blitz can have on your customers. A high level of churn on existing functionality can be really dangerous. You might need to make a point of jumping through a few hoops to retain backwards compatibility.

Don’t be too hasty to refactor if customers are expecting certain things. If you don’t have decent automated regression tests, you really need to ensure you write at least a couple that hit the same area before you fix any defects. (I usually expect my developers to deliver at least 2 or 3 new automated tests with each bug fix).

Worse still, I’ve seen a batch of fixes in a functional area radically change behavior “for the better” according to the developers that raised the original internal defects who then discovered that customers were actually depending on existing behavior or had developed their business processes around the issues.

Sometimes, even if you think it’s ugly, it might be that way for a good reason.

With enterprise software, you’re often looking at heavily customized implementations, some of which have taken years and millions of dollars to evolve.  Smacking these with a mountain of churn on existing functionality can be painful and expensive for your customers. Whilst it makes your numbers look good, consider whether some things should really be touched.

In summary. Bug blitzes are a great way of starting with a cleaner work area and ramping up teams but beware of backwards compatibility, customer impact, time and cost pitfalls.

Read part 4 – “the litter patrol

Escaping the Oubliette (Part 2) – Tailing

Reading time ~ 4 minutes

Following my previous articles on topping and debt prevention, we’ll now focus on the easier parts. How to clear old defects and debt. we’ll cover 4 simple strategies; 2 for defects and 2 for debt. (there are paired similarities across these).

For defects, we’ll focus on:

For debt we’ll focus on:

Unlike prevention and stop the bleeding activities which often require significant effort and commitment to accomplish, defect and debt removal have some simple and relatively low effort options. There are some high-effort/impact alternatives that we’ll cover later.

Today we’ll look at…

Tailing & Ratcheting

When you have a build up of defects over time you tend to have a “tail” of really old defects and issues. These are usually low or medium severity/priority items (with the occasional blip) and are often in functionally gray areas requiring difficult decisions or significant rework.

They age because we don’t want to touch them and would rather forget about them (the oubliette again). A typical defect age distribution looks something like this:

right-skewed beta distribution

right-skewed beta distribution

It’s one of the most common statistical patterns I see in software development (and I’ll return to it in future) but for the purposes of this article, I want to look at the “tail” of this curve- that last 5-10% of your defect population – all your oldest items.

Addressing the tail is pretty straightforward. You can either set yourself a target “maximum defect age by a given date” or simply focus on continuous improvement. Whilst the target approach gives you a clear goal, you risk setting yourself up for failure or under-commitment. Defects are considered notoriously hard to size (I’ll cover this myth in future) but chances are if they’re old they’re probably a bit tricky too so being predictable about dates is something you might prefer to avoid for starters.

Whether you aim for a specific target or improving every day/week/month you’ll need the same stepped approach.

Identify your oldest defect, pick it off the end of the queue and commit to closing it quickly and fairly.

Here’s the first thing to accept… Closing doesn’t mean fix it in every case. Because you’ve not touched it for a while, chances are someone is expecting a solution so you’ll be looking at a difficult conversation if you don’t fix it but that might be far easier than “fixing” something that shouldn’t be changed or will derail your team and product. Make open, fair, honest decisions in each case.

  • If it is something you think you should be fixing, get on with it.
  • If it’s not – close it

Sound familiar?

Aim to close at least one of your oldest defects every week or every sprint.

If you have a lot of items that are considered old, consider increasing your capacity on these in the short term to get the ball rolling. (at the cost of other delivery activities)

If we look back to your distribution of defects over time, when you do close out your oldest defects, put a ratchet mechanism in place that sets a continuously reducing maximum age.  In very bad weeks the ratchet may not improve but don’t let the numbers get worse again or the effort and good faith from your teams and management will have been wasted.

With a ratchet, remember each “notch” is a step change from the last point. For example if your last point was 1000 days(!), clearing everything older than that should leave some defects nearly 1,000 days old. Set the next ratchet point to 950 days (rather than 997), determine what falls in that next block (950-999) and fix those. Ensure each ratchet point is a larger time window than your execution period otherwise you’ll end up stationary. (I usually go for a ratchet up of 50 days improvement per week or sprint).

Here’s what I mean…

Defects Tail

Ratcheting out the oldest defects each week

3 months of this tailing practice every week will dramatically reduce the average and maximum age of issues in your queue. It’ll make you all feel better and it usually helps to draw the heat off on escalations for a while. (Bear in mind you should be doing this in addition to your “topping” approaches).

Occasionally you’ll reopen some old wounds this way but paying these some attention now stops them coming back up when the timing isn’t under your control.

Keep this up for six months or a year and you’ll reach a stable point where nothing gets too old and your team becomes adept at having difficult conversations with your customers early and backs them up with some successes as well.

Eventually in order to keep the ratcheting improving you’ll have to commit more people. This probably means you’re approaching your natural level of control or entitlement where you have aging defects levelling off and others being addressed in a reasonably consistent manner. It’s up to you whether you aggressively pursue the numbers down further or sustain them at this level.

In case this sounds too obvious or simple to work, I’m working with a team right now that introduced this ratcheting approach less than a month ago. We’ve reduced our oldest defect age by over 25% and customer escalations are consistently lower already. Perhaps most important to the team; we’re no longer at the top of the charts when our leaders review the defect stats. Although harsh-sounding. Sometimes, having someone else drawing the heat for a while lets us get back to focusing on value and priorities.

In part 3 we’ll look at an old favourite – the “bug blitz” or feel free to skip to part 4 – “the litter patrol“.

Escaping the Oubliette (Part 1a) – Debt Prevention

Reading time ~ 2 minutes

This is a partial re-post of Escaping the Oubliette (Part 1). I’ve split the article into smaller readable components.

Great, I’ve got my incoming defect strategy nailed,

Now how do I prevent defects and debt in new code?

In 5 words…

Continuous attention to technical excellence.

Here’s my top 7 (there are plenty more)

  1. Acceptance Criteria – Be really disciplined on your acceptance criteria & acceptance tests, team up with Analysts, Testers, Product Owners if you have them and attack your stories from every angle. A good approach to this is a “story kick-off” where the whole team dismantles a story before starting.
  2. Thinking Time – don’t just start coding right away, task things out, try the 10 minute test plan, discuss your approach with your peers and for more complex or large items, try the “just enough design” approach.
  3. TDD – It’s hard to start but has an immense impact.  I’ve just seen a team complete their first project using TDD. 3 weeks into their final round of post feature-complete testing, their defect run-rate hasn’t had the testing spike seen on prior projects. In fact they’re keeping on top of all new incoming defects and have time to start paying down the historic backlog.
  4. Pair Programming – Do it in half-day trial chunks if you don’t have the stomach for going full-tilt. I’ve performed remote pair-programming with colleagues across the Atlantic using decent phone headsets and online collaboration tools for hours at a time. The net result of 2 days of remote pairing was finding and fixing about 10 extra defects in a thousand lines of code that neither of us would have found coding alone.
  5. Peer reviews – there is still a huge space for these in agile teams. But here’s the thing. Be really tough. A peer review is not a hurdle. It’s a shared learning exercise. Functional correctness is actually the smallest component of a peer review. You should trust your developers that far. But there’s a whole series of other aspects to review. See the joy of peer reviews.
  6. Small tasks – I once worked with an outsourced team who when taking work would disappear into a hole for 2 weeks and return with a single task in our configuration management system containing edits to 200+ files and multiple condensed edits to the files. My rule of thumb is one reviewable task per activity. If you’re going to add new functionality and refactor, that’s 2 independent tasks that can be identified and reviewed separately. This means you should be able to easily deliver 2 reviewable, closable tasks per day.
  7. Fast Builds – make it insanely simple for a developer to perform an incremental build that validates new code against the latest main code line. (small tasks are a big help here).  This includes the right subset of unit and functional tests. Aim for a target of a 30 second response time or less between hitting the button and seeing the first results.

In the next article in this series I’ll focus on “Tailing” – How do you start reducing the old defects.

Escaping the Oubliette (Part 1) – Topping

Reading time ~ 2 minutes

As promised, how to Escape The Oubliette

So here you are, part of a team at the bottom of the defect & debt pit. What now?

The simple fact is, there’s no one answer but there are a selection of tools and there is a key.

The key comes courtesy of a comment from Jim Highsmith during one of his sessions at Agile 2010 – my paraphrasing below…

“Teams facing technical debt need both debt reduction and debt prevention strategies.”

When teams face technical debt they pick away, removing it piecemeal. After all, you have to eat an elephant one bite at a time right?

Trouble is, that only works if the elephant isn’t growing as fast as you’re chewing.

You need a two-pronged attack – what I like to call “topping and tailing”. The “top” is all the incoming issues and build-up of new debt. This is where you need to stop the bleeding. The “tail” is the backlog of existing debt. It’s already there – gently rotting in the corner of your product.

Topping

If you’ve read my post on stopping the bleeding, this is the crux of “topping”. You may have a variety of strategies for this but here’s the fundamentals.

Incoming Customer Defects

Ringfence enough capacity to address defects as soon as they come in. Make sure your burn rate matches your incoming rate. If you’re strapped, it’s your call whether you only cover high severity issues but for my money, I’d hit them all. A multi-year build-up of low severity issues makes products ugly, no pleasure for the users and a series of minor problems rapidly becomes a major burden and a big debt buildup.

Addressing every incoming defect doesn’t mean fixing every one. Whilst I generally recommend you do, when trying to dig yourself out of an oubliette the bigger issue is how to keep your goal achievable.

Be critical. The big, nasty bugs are usually pretty obvious but what about the rest?

    • Is it low severity and not important?
      • Is it quick & cheap to fix? – Just fix it.
      • If it’s not – Can you put a safety rail around it? Document it as a limitation or even ignore it?

Negotiate with your customers and make an explicit decision to fix now or never. Once decisions are made, communicated and agreed, close the defects in your backlog.

Getting these conversations happening immediately makes customers far happier than pretending they’ll just go away and never making a decision. (but not quite as happy as fixing their issues).

Incoming Internal Defects

The story here is the same. I’ve called it out independently as many companies and teams do.

Those incoming internal defects speak volumes about your approach to quality.

Get the discipline in place to fix them when you find them or be brutal and admit which ones will never be fixed but don’t pussyfoot around!

  • If you provide a service to other internal teams, treat them as customer defects.
  • If they’re raised by your own team, are they on new code or exposing existing issues?
    • On new code, fix them! – No excuses .
    • On existing code, the problems are already there. You need to decide if what you’ve done has made it more visible or harder to work with than before and make a judgement call. Fix now or never, resolve or close.

In the next article in this series I’ll cover debt prevention

The Oubliette

Reading time ~ 2 minutes

An oubliette is a particularly unpleasant dungeon characterized by a well-like opening. During medievil times these were used “to forget” (oublier -in French) about “unwanted guests”.

Imagine how you’d feel stuck at the bottom of an oubliette?

It’s probably one of the least inspiring places to be. Expect feelings of despair & hopelessness. You might find yourself asking “What do I do?”, “Why do I bother?”, “It’d be easier to just curl up and let it end”.

I’ve seen development teams end up in similar situations. Untamed defects build up around them over a period of years until it’s too late. The backlog is so deep there’s no hope of getting out. Even committing the entire team to just fixing bugs for over a year won’t save them.

It starts with a brittle codebase. A prior combination of poor architecture, lack of clear standards, lack of debt management and years of too much corner cutting. Often the underlying culprit is repetetive “business pressure” with a team that have not been empowered to say no. This mountain of cut corners and poor decisions offends the sensibilities of all but the most cavalier of developers but once the pit starts to get deep and squishy, what incentive is there to improve?

Your business cannot see delivery of features sacrificed to refactor the codebase, it adds no business value!

The gradual drip drip of quality problems continue as your ability to keep up with demand for new features slowly leaches away, the team slows down further, it costs more and takes longer to develop as the business piles on the pressure for even more new features in less time.

The oubliette spirals towards oblivion and like a prisoner at the bottom of the well, your team and codebase become starved of quality.

Don’t let this happen to you.

(Here’s the first part of how to escape and prevent the oubliette)