Transactions for Managing Technical Debt

Share
Reading time ~4 minutes

If you’re losing capacity maintaining brittle code or infrastructure, the “litter patrol” in a related neighborhood may be good practice but how do you visualize and manage the true source of your pain?

This is based on a mix of my own experiences and great nugget of insight picked up from Jim Highsmith at Agile 2010 who in turn credited Israel Gatt.

First the paraphrased nugget from Jim…

“Teams need both debt reduction and debt prevention strategies.”

Here’s the quote from Israel back in December ’09.

“If your company relentlessly pursues growth, the quality/technical debt liability it is likely to incur could easily outweigh the benefits of growth. Consider the upside potential of growth vis-a-vis the downside of the resultant technical debt. When appropriate, monetize technical debt using the technique described in Technical Debt on Your Balance Sheet.”

Here’s my expansion to Israel’s article…

How do we know we have debt?

We “know” it’s there, we feel it. But do the right people know about it? You may be allocating permanent team capacity to keep bailing without seeing the hole.

  • Train all your teams in spotting technical debt
  • Develop and communicate your debt strategy to your teams and stakeholders.
  • Determine a common unit of currency (points, hours, money, NUTs) that your stakeholders can understand and engage with and use this to describe the transaction decisions you’re making.

Communicate current debt – “Balance”

  • This is obvious but hard to achieve. For the debt that hurts; quantify the cost and determine what you could do if it were resolved. Get that opportunity cost shared and get your solution sponsored. In most large companies if you can demonstrate a cost reduction or productivity improvement you can get support.

Communicate new debt – “Recent Transactions”

  • Not the decade-old pile in the corner! Focus on what you’re forced to add because of deadlines and market pressure. Every time you make a debt-related compromise, get that conversation exposed. Determine the maximum acceptable age for any debt being added, a cost (in your selected currency) and a priority. Whilst there are cases where a debt may never need to be repaid, chances are you’ll need to pay the next transaction off.

Limit debt – Set yourself a “Credit Limit”

  • What’s the maximum debt your team/project/product is willing and/or able to tolerate? Set a credit limit and stick to it. If we blow our limit, we’re in trouble. My personal rule-of-thumb is if it’ll take more than one whole sprint for the entire team to clear all new debt then you’re worryingly-leveraged and heading away from releasable software. This approach means a simple default credit limit for a new project (using story points as currency) is the same as your velocity – simple to calculate and remember. You could convert this into the average hours (and therefore financial impact) a story of that size takes if that means more to your leaders.

Whatever you do, don’t over-mortgage your product. You don’t want your product portfolio to collapse under unrecoverable bad debt.

Set a maximum age threshold – “Payment Terms”

  • Use topping & tailing approaches to control and ratchet your debt age to keep the span down.  If your debt exists for more than N sprints (say 3) promote it to the top of the priority stack. Fixing new debt when it arrives is hard work and items do slip through. For those that escape, setting a maximum age means you’ll cycle all your debt through and keep it fresh. Even tough problems get resolved rather than rotting in the debt heap.

Visualize and report your debt – provide a “Debt Statement”

  • Put all debt visibly on your wall as cards or stickies, give it a different color, heading, whatever – make it visible. Now every sprint let’s highlight the numbers, the growth, reduction, reiterate your limit. If you’re using an electronic tool, try putting all the debt items under a single parent and tracking cumulative flow and burn-up of debt rolled up to that item.

Prioritize your debt – “Positions”

This is getting a little more advanced and difficult to manage – I’d reserve this for only projects that already have high debt where the simple strategies alone are too invasive.

  • Partition your debt into 3 positions. New:Short-term, New:Long-term, Old:All.

Set a different credit limit and payment terms for each of these and as part of prioritization set a ratio of effort/pay-off against each that ensures that at a minimum debt is sustained at a constant level but focus on cycling through so that the contents remain fresh.

Taking Action

Here are a few other points to explore – mostly around debt reduction.

• What “small wins” exist for you? Are there some simple debts than can be cleared quickly?

• If you sacrificed a team member for 1 sprint to pay off some short-term debt, would you be able to increase the overall team performance in later sprints?

• If a particular single debt item is too big for a team to swallow in a single sprint, can you call in or implement a parallel “SWAT Team” to fix it?

• What capacity do you need to reserve to sustain your current debt level?

• If you were going to reduce your debt burden by 10-20% what effort/capacity would that require and how long would it take?

• Can you demonstrate a return on investment for a particular piece of debt-reduction?

• Worst-case scenario: Could some debt control run “under the radar”? (And if this were discovered, what problems would be caused?)

Wrapping up:

Develop both debt prevention and reduction strategies for your teams, don’t just focus on reduction.

Treat technical debt like real personal or business debt: use “credit limits”, “statements”, “balance”, “transactions” & “payment terms” to your advantage.

To achieve effective reduction, work through the Oubliette strategies and in worst cases, consider a “SWAT Team”.

5S Your Scrum Board – (Part 2) – In depth

Share
Reading time ~8 minutes

I’m a big fan of Goldratt’s Theory of Constraints for getting important improvements made in the most suitable order for a team. Our current scrum board is the result of focusing on the most evident, soluble problem each day for a couple of weeks. No grand plan, just good emergent design.

Note this is a very long article as some readers of part 1 asked for all the details as soon as I could get them written, today’s the first time I’ve been near my blog in a week. I’ll switch back to shorter posts after this for a while.

Here’s an in depth look at each of the areas of our board…

scrum board with placeholders

Using the peripheral space for useful stuff

 

Feedback

A single sticky with a quote regarding some feedback. The last 3 sprints we’ve left one up there that says “It shouldn’t be all that hard” . A reminder not to generalize (“it” was much harder)

Theme

One sticky. Our current theme is “Contact Management”, The next will be “Dashboarding” or similar. Just a little reminder of what the overall focus of the sprint or sprints is. This helps when clarifying or controlling scope, questions, debt & defects.

Capacity

During sprint planning we look at who’s in, when and what other commitments they have (beyond the usual overheads and maintenance activity). Capacity is calculated in half-day (4-hour) increments for each of us, totalled up as a whole team in hours. We then subtract 40% for overheads & support.

We track a second capacity number under the first which is the actual total effort hours of tasks completed in the last sprint. Generally this is a fair bit lower than the forecast.

The differences between these two figures are useful input for understanding our capabilities & overheads.

We limit tasks during sprint planning to around 70% of our maximum delivery capacity or close to our previous delivery capability  in order to leave space for unknown / discovered activities and hurdles. If planning another story would take us over this level, we hold off until we’ve actually been able to deliver the first – there’s no point in breaking stories down into tasks if they’re highly unlikely to happen in the next 2 weeks.

If we really do well, we can always reconvene.

Tags

At the moment we have 2 sets of tags on the board; “Blocked” and “Please Test”. The team no longer uses the “please test” tags so they’re likely to be removed this week. We assume coding tasks include developer-led testing activity and any other testing is planned in as scheduled tasks. This has reduced dev-test handover as the whole team focuses on the set of tasks required to complete a story, not their individual tasks.

I’m likely to add “red tags” soon for tasks that need particular focus or expediting. We’ll limit the use of red tags to one WIP item at a time and only when needed.

Key

A color key for our tasks. Currently we have

  • blue : code & test
  • green : design & ux
  • yellow : test
  • orange : management noise
  • pink : support & bugs

Stories

 The top horizontal swimlane of our board contains stories and story activities.  The left-most column of this contains story cards in priority order from top to bottom for stories planned for this sprint. The space here is deliberately small. The last 2 sprints we had only 2 stories listed, in the next sprint we’re actually reducing it to 1 and having the whole team pull single stories through. (Update – We ended up pulling single stories as a flow after this point right to the end of the project)

“Extra”

A placeholder for “extra” stories.

If we really do clear all the stories on the board, this area is a placeholder in rough priority order for the next 1 or 2 stories we might play. These are not broken down into tasks unless there’s certainty that we’ll actually work on them in this sprint.

Physical Holders

A more traditional lean technique. We have an individual holder for 1-2 pads of each color of stickies. We can quickly see if any are running low and replenish stock. We also have a holder for story cards although that’s rarely used (we generally don’t add stories when stood at the board).

There’s also a holder for pens. We ensure there’s enough marker pens in the holder for everyone to have one during our standup and we ensure that there’s one of each whiteboard pen color needed for drawing graphs and totals on the board.

Hours (on stories)

This is the initial estimate in effort hours of all tasks we identified prior to & during sprint planning that went onto the board at the start of the sprint. We compare these with total hours completed at the end of the sprint to learn from and improve our tasking and estimation.

Over time we hope to gather enough data to see the typical ranges of hours spent on stories of each size (we expect a bit of overlap)

Sprint Tasks

Our second main horizontal swimlane is “sprint tasks”. These are all the activities required by us as a team in order to be able to complete the sprint and usually deploy our working software to production.  We also tend to add extra technical debt cleanup items into this swimlane.

This approach seems better at the moment than having specific (and artificial) stories for these activities.

It’s up to us how much trade-off we have between sprint tasks and story tasks in a sprint. Since creating this swimlane we’re finding 40-60% of our effort is going into this area however this is because we’ve been prioritising some technical debt and deployment improvement activities. That rate will adjust over the next couple of sprints.

Support/Bugs

The bottom swimlane on our board is for support issues and bugs. Right now we’re not using this very much as our support and old bug fixing activity has been relatively low.

We’ll see how useful this lane is longer-term.

Note, bugs related to work we’re actually doing are either fixed immediately and not added to the board at all or are added into the stories swimlane relevant to the story they’re found in.

Todo/Next/WIP/Done

Todo, WIP and Done are the typical status lanes for tasks you see on most boards these days. “Next” is an addition we added in very early on and has been a bit of a game-changer in managing the flow of tasks effectively here’s a link to the full article on the “next” column.

Retrospective Input

I talked a little bit about this at the UK Agile Coaches Gathering this weekend and I’ll expand more in a standalone article shortly. Put simply, we found that getting potential retrospective input visible every day during the daily standup has been more effective in driving continuous improvement and makes it easier to collate retrospective data at the end of the sprint. Mike Pearce also recommended something similar over the weekend – maintain your release and sprint timeline with pitfalls and lessons so that there’s less head-scratching at the end.

Graphs:

We’re tracking burn-down of task hours for both sprint tasks and story tasks in parallel (different colors). Then on the same graph we’re also tracking the burn-up of total capacity. It’s working okay for us at the moment but I can’t help thinking it’s still not quite right so I’m sure we’ll revisit this.

Beneath the burn-up/down we’re tracking per-sprint velocity as a basic bar chart. Straightforward, simple, easy to complete and easy to understand.

Sprint Schedule / Countdown boxes

As of the end of the last sprint we removed the sprint schedule, the list of historic dates wasn’t adding anything for us. We have the current sprint dates at the top-left of the board instead and have used this space for a set of countdown blocks 10..1 with effort hours to do and forecast remaining each day. This gives us immediate feedback as to whether we may have a problem in meeting the end of the sprint with current task scope. I’ll write up the experience on the countdown boxes separately once we’ve been using them a while.

“Done Done”

This serves a couple of subtle purposes…

There’s something very satisfying about taking tasks from the board and dumping them in the “done done” box.

We clear down the board into “done done” at the end of each sprint. This makes it clear that we’re not really done until we’ve completed everything and cleared the decks.

Although the box itself is basically a trash holder, it serves as a constant reminder that we’re not done until we’re done :)

We’ve added a few more things since I drew the original picture of our board…

Running Totals

After the daily stand up we sum up the total effort hours for each horizontal/vertical swimlane square/block and scribble those on the board. We can then transpose these into tracking graphs.

Right now we only track todo by horizontal lane and overall done but we have the data available for cumulative flow tracking if we feel the need.

Debt

We’ve added a very small space for newly accumulated or discovered debt related to our current work. It’s deliberately small to keep things minimal. When it fills up we’ll need to do something with it (if not sooner). It gives us one more line of slack if things go unexpectedly in a sprint for some reason whilst we still focus on deploying working software. This is still a running experiment but the intention is to treat this like credit card debt – it’s a short-term postponement mechanism, not a backlog.

Next Sprint Tasks

In much the same way as we have “extra” stories, we’ve added a placeholder for tasks we need to do in the following sprint but don’t have capacity for right now. This stops us overloading the current sprint tasks lane once things start moving without forgetting important future activities.

An Update (November 2013)

I’ve been rooting through my old photos and found a picture of the board toward the end of the project (September 2012) so you can see what it *actually* looked like as it evolved beyond what I’ve described above.DevOps Email Marketing Project Board Sept 2012

If you’re interested in even more whiteboard designs (and how they evolved over time) take a look at “A Year of Whiteboard Evolution”.

Closing Comments…

Don’t be afraid to adjust your working area if things don’t seem to be flowing right.

This isn’t the end for our board adjustments, what we’re using will certainly be different in a few weeks time however today it serves our needs well and must remain simple enough for new team members to understand.

I strongly recommend reading Henrik Kniberg’s recent book. I read the review draft last week (July 2011) and saw close similarities in the board structures the PUST teams at RPS were using and what we have here. I also share the same view as Henrik that each team – even within the same group – should be free to choose their own board style.

Try taking a few of these ideas away to experiment with but you probably won’t need them all.

Escaping the Oubliette (Part 4) – The Litter Patrol

Share
Reading time ~2 minutes

As promised in my last installment on oubliettes…

Your team might not be fully ready for the merciless refactoring encouraged by some agile approaches but this will help you stay heading in the right direction whilst keeping the delivery vs refactoring impacts balanced out.

The cost of change has a (roughly) logarithmic relationship to debt. I’ve seen first-hand how high-debt systems become almost impossible to change and it’s not pretty.

In a debt-ridden system we are eventually faced with a choice; refactor or replace. Eventually even once-newly-replaced systems build up debt and the refactor/replace choice returns. Craig Larman & Bas Vodde’s most recent book covers the debt/cost relationship brilliantly in the section on “legacy code”. They also describe the oubliette strategy of “Do No Harm” or as I call it – The Litter Patrol“.

This is a particularly powerful debt management approach as it’s both a prevention and reduction strategy.

Here’s the basic concept…

When working with an area of legacy code, you’re working in a particular “neighbourhood“. If that neighbourhood is untidy your care and attention to that neighbourhood is diminished. Much like the “broken windows” principle; once the damage seal is broken, neglect and further damage follow and overall code quality deteriorates rapidly.

So (without going overboard), every time you’re working in a particular neighbourhood, what can you do to clean up a few pieces of litter?

Not the run-down shopping district between lines 904 and 2897 but more the abandoned classic car between lines 857 and 892 or the overflowing trashcan on the corner between 780 and 804.

If you introduce a litter patrol in your teams and encourage a hygiene cycle with every change to your code, your debt load and future cost of change will rapidly reduce for the areas you hit most frequently.

Unfortunately although this is easier than a complete refactoring of a poorly designed class-hierarchy or monolithic god-class, in order to perform the litter patrol in safety on your code you need good unit tests or small functional tests for that neighbourhood and ideally some refactoring support (I like to call these your gloves, garbage bag and litter picker).

This challenge doesn’t mean don’t do it. In fact if you don’t have tests already, maybe your next patrol isn’t changing the code at all but to write just a couple of small, independent tests to demonstrate how that area is expected to behave. (You might even need to make some tweaks to make it testable)

If it’s hard, don’t avoid the problem, focus on making life easier one bite or constraint at a time.

Every time we successfully deliver a small clean-up task, the future cost of change to that area is reduced and our incentive to keep it clean is improved.

Look out for part 5 – sponsorship – coming soon.

Escaping the Oubliette (Part 3) – Bug Blitz

Share
Reading time ~4 minutes

Every product team I’ve ever worked in had a bug blitz at some point and often one every year or two.

There’s no arguing that a decent bug blitz is a powerful way of getting the numbers down and clearing all the bugs in a good, solid drive feels good but the necessity for them is caused by a buildup from somewhere.

If you find you’re needing a bug blitz on every release, take a look at your defect and debt prevention activities and make sure you’ve got some topping & tailing practices going on.

Usually bug blitzes are performed at the end of a project but if you’ve not done so before, try having a 4-8 week blitz at the beginning instead. You’ll run quicker afterward and (if you keep things under control), you won’t have to worry about having time to mop up at the end.

Once you’ve got the numbers down, set your maximum defect threshold (or ratchet) at this level for the remainder of the project and keep this new lower level sustained throughout development.

What’s good about a bug blitz?

A blitz is particularly useful if you can focus on areas of the product you’ll be working with soon. It’ll get your team working together (particularly if they’re newly formed) and familiar with these areas before all the major work starts.

Couple this up with developing some decent automated tests in those areas as part of the defect fixing and you’ll be developing a much safer scaffold for your new work and reduce regression risks during your next release.

You could take things further and perform some refactoring but I suggest keeping different types of activities separate at this point and just stick to straight bugs. If you have a specific functional area needing a real overhaul, it doesn’t fit the bug blitz mold. I’ll cover this aspect  in more depth when I talk about “sponsorship” for debt reduction.

I use bug blitz approaches when training up new staff. I review the defect backlog for a particularly grubby functional area and have a pair of staff take custody of it as caretakers. We pipeline the defects so that they can start with some simple introductory ones (usually low severity, noise or cosmetic stuff) and once we get confident in these we expand out into adjacent areas – it usually takes a month or two for them to really get warmed up.

The bug blitz is also a great opportunity to start new release development with a clean slate. It means no mixing types of work during feature development and gives you an opportunity to scrub the grime out and get a few new scaffolding tests in place.

What about the down-side?

First; they cost time and money. If you dedicate a team (or most of a team) to a blitz you’re not delivering anything new. This is obvious but important. What’s the impact of a month’s delay to your next release? (assuming you can ship with those bugs in there). And what will that delay do to your stakeholder relationships?

Be mindful of the impact a bug blitz can have on your customers. A high level of churn on existing functionality can be really dangerous. You might need to make a point of jumping through a few hoops to retain backwards compatibility.

Don’t be too hasty to refactor if customers are expecting certain things. If you don’t have decent automated regression tests, you really need to ensure you write at least a couple that hit the same area before you fix any defects. (I usually expect my developers to deliver at least 2 or 3 new automated tests with each bug fix).

Worse still, I’ve seen a batch of fixes in a functional area radically change behavior “for the better” according to the developers that raised the original internal defects who then discovered that customers were actually depending on existing behavior or had developed their business processes around the issues.

Sometimes, even if you think it’s ugly, it might be that way for a good reason.

With enterprise software, you’re often looking at heavily customized implementations, some of which have taken years and millions of dollars to evolve.  Smacking these with a mountain of churn on existing functionality can be painful and expensive for your customers. Whilst it makes your numbers look good, consider whether some things should really be touched.

In summary. Bug blitzes are a great way of starting with a cleaner work area and ramping up teams but beware of backwards compatibility, customer impact, time and cost pitfalls.

Read part 4 – “the litter patrol

Priority Fatigue

Share
Reading time ~2 minutes

This came to me at 5am after a bout of insomnia…

In the last few years the concept of technical debt has really taken root. Teams discuss it and use it to ensure important leftovers get cleared, not just business-critical priorities.

Here’s a fresh verbal anchor…

“Priority Fatigue” – The wear that sets in if you do nothing but focus on the priorities of your products or leaders all the time.

If you’re using Scrum or Kanban; chances are you’re working through some kind of prioritized backlog of work. Most Scrum practitioners are aware that despite iterations being called sprints the team are actually running a marathon. Every now and again your team needs to take a breather and at the end of a release they need proper recovery time.

Clearing technical debt is a common way of recovering. Another approach used by forward-thinking organizations is to have periodic innovation days or weeks where everyone “downs tools” and does something interesting instead. Good team-building days or activities are a third option.

These are all ways of addressing priority fatigue on a team.

Weekends and holidays are the personal slack that we use to pay off some of our individual priority fatigue however many of us don’t actually rest any more.

Our lives are so full we don’t have time to recover. In fact, many people now continue (at least partially) working even whilst on holiday – it’s frequently expected these days.

In the same way we relieve priority fatigue for teams, consider taking time to step back reward ourselves as individuals in innovative ways. If nothing else; take some regular time out to do something interesting even if it’s not important. **

**Caveat: don’t overdo it! Strike a balance with your priorities.