Using Every Tool in the Box

Reading time ~5 minutes

At one of my former employers it was mandatory for every engineer and manager to be Lean Six Sigma Certified to at least Green Belt level. Most executive managers also needed to be Black Belt certified.

Don't worry, this Ninja is 'armless

(One of my team on my last project is a real black belt – in Aikido – he finds the copying of terms somewhat problematic. I’m not going to argue with him!)

The company was heavily manufacturing-oriented; producing machinery for oil & gas, jet engines, power station systems, white goods, medical equipment and more. You name it, they probably made it somewhere around the world.

The Lean aspect of the certification was a later addition but made it worthwhile – at least for me. However most of the “engineers” I worked with at my main site were purely software engineers. (A few of our other sites I worked with had software, firmware and hardware for the same products under one roof). The software teams didn’t see the value in a (rather rigorous) mandatory certification – and given the way it was taught and introduced I don’t blame them.

A Green Belt certification requires a week of training, book study, completion of an exam and delivery of a full end-to-end process improvement project following one of the Lean Six Sigma improvement cycles (typically DMAIC – Define, Measure, Analyze, Improve & Control). Beyond this, the DMAIC project has to use a broad selection of statistical process control and measurement tools for each of about 15 steps throughout the project.

Certification candidates had to identify a real (but small enough to be achievable) process improvement project on their site and deliver it through this process. A challenge with our software teams is that many felt they were already following continuous improvement practices and that the overweight nature of the certification projects was entirely unsuitable for the work and improvements they were doing.

In many ways they were right.

The problem was that nobody ever explained why the projects needed to be so rigorous and what eventual value they would see. All they were told was “By becoming Lean Six Sigma certified you can talk with staff and managers across the company at any level using a common language and understanding.”

That’s a great ideal – except that their management chain already spoke the language of software.

As a manager at the site, I too had to run a DMAIC project. It took nearly a year to complete as I had to use “borrowed time” on top of all the other projects and initiatives we were all responsible for. Funnily enough it was only after I left that company that I discovered how valuable the experience was.

Picture working in a garage for a high-end specialist motor vehicle company.

You have an array of tools in front of you. As a novice you don’t know how to use anything but a spanner. You barely recognise what some of the tools even do but as an apprentice, you’re trained up, taught the proper way to use each tool and in what situation and context. (You don’t – after all – use a sledgehammer to crack a nut).

This is where my Six Sigma training and certification failed significantly.

You’re required to prove you can use all the tools in the box but nobody actually explains that once you’ve learned how to use them, you only need you to use the right tools at the right time.

As a result, the indoctrination of Six Sigma churned out a high number of people that believed it was mandatory for every process improvement project to follow the same rigorous steps and use all the tools in the box. (I believe this culture has changed significantly now though)

The problem was exacerbated further by introducing a specific Six Sigma career path. Staff could apply for “Black Belt” roles that were a 2 year rotation through various parts of the business. These were amazing roles focused on delivering rigorous process improvement programs all over the wold however they had some dangerous flaws.

The 2 year rotation was actually the time it took to complete a Black Belt certification. Much like the Green Belt, it required completion of projects. In this case, two major process improvements for a business. Plus teaching and facilitating Six Sigma and Lean events.

Most (but not all) staff in Black Belt roles were not actually certified yet. Their personal goal for the 2 years rotation was to attain their certification such that they could move on to  become a “Lean Leader” or similar roles elsewhere.

Imagine the distortions in behaviour you’d see if the primary motivation for the person leading your major process improvement program was to complete 2 projects in 2 years in order to attain certification.

I personally experienced one particularly bad instance toward the end of a project at my site where a team of unqualified Black Belts were parachuted in to “fix” our out of control defect backlog. On the whole, these people were not experts – but they knew how to work the system.

In this particular case; toward the end of the project when it became clear their original quality target was not physically achievable they actually moved the goalposts.

Rather than a defect fix being declared “done” when shipped and accepted by a customer, “done” was redefined to mean “code complete”. They fast-tracked nearly a thousand bug fixes in 6 months through engineering (and our outsourced partner) with no capacity for test and release resulting in significantly more code churn than our customers were willing to tolerate in the same time frame. The development teams couldn’t actually ship the fixes to customers and at the end of the project, the Black Belt team were hailed as heroes for “fixing” the problem and process.

Statistically, according to their measurement system, they had!

The Irony is, the new process they introduced was great. They used a construction and manufacturing management concept known as “Line of Balance” to plan and deliver fixes according to customer “want dates”. If the team had the freedom to ship specific releases to the right customers and customers had the ability to install them it would have worked amazingly well however their analysis stopped within the building without awareness that those customers generally had many years and millions of dollars worth of product customizations meaning they were unable to take “raw” product releases any more.

They missed the critical context of our customers and users.

Anyway… I digress!

It’s worth learning how to use as many tools and techniques as you can. Understanding the powerful combinations available to you across different theories and practices allows you to combine and apply that knowledge in valuable and novel ways. 

But just because you know how to use all those tools, doesn’t mean you need them all the time – choose the right tool and process for the job at hand and remember to keep things as simple as possible.



The Pitfalls of Measuring “Agility”

Reading time ~7 minutes

This post expands on one of the experiences I mentioned in “Rapunzel’s Ivory Tower“.

I presented these lessons and the story at Agile Cambridge back in 2010.  It’s taken nearly 5 years to see the light of day in writing on here. I hope it’s not too late to make a difference.

I and my team hadn’t been in our roles long. We’d been given a challenge. Our executives wanted to know “which teams are agile and which aren’t” (see the Rapunzel post for more). We managed to re-educate them and gain acceptance of a more detailed measurement approach (they we were all Six Sigma certified – these people loved measurement) and I’d been furiously pulling the pieces together so that when we had the time to work face to face we could walk away with something production ready.

Verging on quitting my job I asked James Lewis from Thoughtworks for a pint at The Old Spring. I was building a measurement system that was asking the right questions but there was no way I could see a path through it that would prevent it being used to penalize and criticise hard-working teams. This was a vital assessment for the company. It defined clearly the roadmap we’d set out, took a baseline measure of where we were and allowed us and teams to determine where to focus.

My greatest frustration was that many of the areas teams would score badly were beyond their immediate control – yet I knew senior management would have little time to review anything but the numbers.

James’ question he left me with was:

“How do you make it safe for teams to send a ‘help’ message to management instead?”

I returned to my desk fuelled by a fresh pair of eyes and a pint of cider. I had it!

At the time many agility assessments had two major flaws.

1 – they only have a positive scale – they’re masking binary answers making them look qualitative but they’re not.

2 – They assume they’re right – authoritative, “we wrote the assessment, we know the right answers better than you…”

  • What if the scale went up to 11 (metaphorically) How could teams beat the (measurement) system.
  • And what if 0 wasn’t the lowest you could score. What would that mean?

The assessment was built using a combination of a simpler and smaller agility assessment provided to us by Rally plus the “Scrum Checklist” developed by Henrik Kniberg, the “Nokia Test“, the XP “rules” and my own specific experiences around lightweight design that weren’t captured by any of these. As we learned more, so we adapted the assessment to bake in our new knowledge.  This was 2009/2010, the agile field was moving really fast and we were adding new ideas weekly.

The results were inspired – a 220 question survey covering everything we knew. Radar charts, organizational heat maps, the works.

The final version of the assessment (version 27!) covered 12 categories with an average of about 18 questions to score in each category:

  1. Shared responsibility & accountability
  2. Requirements
  3. Collaboration & communication
  4. Planning, estimation & tracking
  5. Governance & Assurance
  6. Scrum Master
  7. Product Owner
  8. Build and Configuration Management
  9. Testing
  10. Use of tools (in particular Rally)
  11. Stakeholder trust, delivery & commitment
  12. Design

The most valuable part was the scale:

  • -3 We have Major systemic and/or organizational impediments preventing this (beyond the team’s control)
  • -2 We have impediments that require significant effort/resource/time to address before this will be possible (the team needs support to address)
  • -1 We have minor/moderate impediments that we need to resolve before this is possible (within the team’s control)
  • 0 We don’t consider or do this / this doesn’t happen (either deliberate or not)
  • 1 We sometimes achieve this
  • 2 We usually achieve this
  • 3 We always achieve this (*always*)
  • 4 We have a better alternative (provide details in comments)

a radar chart from excel showing each category from the assessment

The assessment was designed as a half-day shared learning experience. For any score less than 3 or 4, we would consider & discuss what should be done and when, what were the priorities, where did the team need support, what could teams drive themselves and what were the impediments. Teams could also highlight any items they disagreed with that should be explored.

Actions were classified as:

  • Important but requires management support / organizational change to achieve
  • Useful, low effort required but requires more change support than low hanging fruit
  • Potential “low hanging fruit”, easy wins, usually a change in practice or communication
  • Important but requires significant sustained effort and support to improve

As a coaching team we completed one entire round of assessments across 14 sites around the globe and many teams then continued to self-assess after the baseline activity.

Our executive team actually did get what they needed – a really clear view on the state of their worldwide agile transformation. It wasn’t what they’d originally asked for but through the journey we’d been able to educate then about the non-binary nature of “being agile”

But the cost, the delays, the iterative approach to developing the assessment, the cultural differences and the sheer scale of work involved weren’t sustainable. An assessment took anything from an hour to two days! We discovered that every question we asked was like a mini lesson in one more subtle aspect of agile.  Fortunately they got quicker after the teams had been through them once.

By the time we’d finished we’d started to see and learn more about the value in Kanban approaches and were applying our prior Lean experience and training rather than simply Scrum & XP + Culture. We’d have to face restructuring the assessment to accommodate even more new knowledge and realized this would never end. Surely that couldn’t be right.

Amongst the lessons from the assessments themselves, the cultural differences were probably my favourite.

  • Teams in the US took the assessment at face-value and good faith and gave an accurate representation of the state of play (I was expecting signs of the “hero” culture to come through but they didn’t materialize).
  • The teams in India were consistently getting higher marks without supporting evidence or outcomes.
  • Teams in England were cynical about the entire thing (the 2-day session was one of the first in England. Every question was turned into a debate).
  • The teams in Scotland consistently marked themselves badly on everything despite being some of our most experienced teams.

In hindsight this is probably a reflection on the level of actual knowledge & experience of each site.

Partway through the baseline assessments after a great conversation with one of the BA team in Cambridge (who sadly for us has since retired) we added another category – “trust”. His point was all the practices in the world were meaningless without mutual trust, reliability and respect.

It seemed obvious to us but for one particular site there was so much toxic politics between the business leadership and development that nobody could safely tackle that he had an entirely valid point. I can’t remember if we were ever brave enough to publish the trust results – somewhat telling perhaps? (Although the root cause “left to pursue other opportunities” in a political power struggle not long before I left).

Despite all this the baselining activities worked and we identified a common issue on almost all teams. Business engagement.

We were implementing Scrum & XP within a stage-gate process. Historically the gate at which work was handed over from the business to development was a one-way trip. Product managers would complete their requirements and them move on to market-facing activities and leave the team to deliver. If a team failed to deliver all their requirements it was historically “development’s fault” that the business’ numbers fell short. We were breaking down that wall and the increased accountability and interaction was loved by some and loathed by others.

We shifted our focus to the team/business relationship and eventually stopped doing the major assessments. We replaced them with a 10 question per-sprint stakeholder survey where every team member could anonymously provide input and the product managers view could be overlaid on a graph. This was simpler, focused and much more locally & immediately actionable. It highlighted disconnects in views and enabled collaborative resolution.

Here’s the 10 question survey.

Using a scale of -5 to +5 indicate how strongly you agree or disagree with each of the following statements  (where -5 is strongly disagree, 0 is neutral and +5 is strongly agree)


  • The iteration had clear agreement on what would be delivered
  • The iteration delivered what was agreed
  • Accepted stories met the agreed definition of done
  • What was delivered is usable by the customer
  • I am proud of what was delivered


  • I am confident that the project will successfully meet the release commitments
  • Technical debt is being kept to a minimum or is being reduced


  • Impediments that cannot be resolved by the team alone are addressed promptly
  • The team and product manager are working well together

If you’re ever inclined to do an “agile assessment” of any type, get a really good understanding of what questions you’re trying to answer and what problems you’re trying to solve. Try to avoid methodology bias, keep it simple and focused and make sure it’s serving the right people in the right ways.

Oh – if you’re after a copy of the assessment, I’m afraid it’s one of the few things I can’t share. Those that attended the Agile Cambridge workshop have a paper copy (and this was approved by the company I was at at the time) but I don’t have the rights to share the full assessment now I’m no longer there. I also feel quite strongly that this type of assessment can be used for bad things – it’s a dangerous tool in the wrong circumstances.

Thanks – as always – for reading.

Seeing the Value in Task Estimates

Reading time ~5 minutes

a list of task estimate sizes with beta curves overlaidYou might be aware of the ongoing discussions around the #noEstimates movement right now. I have the luxury here of rarely needing to use estimates as commitments to management but I usually (not always) still ask my teams to estimate their tasks.

My consistently positive experiences so far mean I’m unlikely to stop any time soon.

3 weeks ago I joined a new team. I decided I wanted to get back into the commercial side of the business for a while so I’ve joined our Sales Operations team. (Think DevOps but for sales admin, systems, reporting, targeting & metrics).

Fortunately for me the current manager of the team who took the role on a month or so earlier is amazing. She has so much sales domain knowledge, an instinct for what’s going on and deeply understands what’s needed by our customers (the sales teams).

I’d been working with her informally for a while getting her up to speed on agile project management so by the time I joined the team already had a basic whiteboard in place, were having effective daily standups and were tracking tasks.

The big problem with an ops team is balancing strategic and tactical work. Right now the work is all tactical, urgent items come in daily at the cost of important but less urgent work.

We’re also facing capacity issues with the team and much of the work is all flowing to a single domain expert who’s due to go on leave for a few months this Summer – again a common problem in ops teams.

I observed the movement of tasks on the team board for a week to understand how things were running, spot what was flowing well and what was blocked. As I observed I noted challenges being faced and possible improvements to make. By the end of the week I started implementing a series of near-daily changes – My approach was very similar to that taken in “a year of whiteboard evolution“.

Since the start of April we’ve made 17 “tweaks” to the way the team works and have a backlog of nearly 30 more.

Last week we started adding estimates to tasks.

I trained the team on task estimation – it took less than 10 minutes to explain after one of our standups. The technical details on how I teach this are in my post on story points. But there’s more than just the technical aspect. (In fact the technicalities are secondary to be honest)

Here’s the human side of task estimation…

  • Tasks are estimated in what I describe as “day fragments” – essentially an effort hours equivalent of story points. These are periods of time “small enough to fit in your head”.
  • The distribution scale for task estimates I recommend is always the same. 0.5, 1, 2, 4, 8, 16, 24 hours. (the last 3 are 1, 2 and 3 days) – It’s rare to see a task with a “24” on it. This offers the same kind of declining precision we see with Fibonacci-based story point estimates.
  • For the level of accuracy & precision we’re after I recommend spending less than 30 seconds to provide an estimate for any task. (Usually more like 5-10)
  • If you can’t provide an estimate then you’re missing a task somewhere on understanding what’s needed.
  • Any task of size 8 or more is probably more than one task.
  • Simply having an estimate on a task makes it easier to start work on – especially if the estimate is small (this is one of the tactics in the Cracking Big Rocks card deck)
  • By having an estimate, you have a better idea of when you’ll be done based on other commitments and activities, this means you can manage expectations better.
  • The estimates don’t need to be accurate but the more often you estimate, the better you get at it.
  • When a task is “done”, we re-check the estimate but we only change the number if the result is wildly off. E.g. if a 1 day task takes just an hour or vice versa. And we only do this to learn, understand and improve, not to worry or blame.

So why is this worth doing?

Within a day we were already seeing improvements to our flow of work and after a week we had results to show for it.

  • The majority of tasks fell into the 0.5 or 1 hour buckets – a sign of lots of reactive small items.
  • Tasks with estimates of 8 hours or more (1 day’s effort) were consistently “stuck”.
  • We spotted many small tasks jumping the queue ahead of larger more important items despite not being urgent. (Because they were easier to deliver and well-understood)
  • Vague tasks that had been hanging around for weeks were pulled off of the board and replaced with a series more concrete smaller actions. (I didn’t even have to do any prompting)
  • Tasks that still couldn’t be estimated spawned 0.5 or 1 hour tasks to figure out what needed to be done.
  • Large blocked items started moving again.
  • Team members were more confident in what could be achieved and when.
  • We can start capacity planning and gathering data for defining service level agreements and planning more strategic work.

I’m not saying you have to estimate tasks but I strongly believe in the benefits they provide internally to a team.

If you’re not doing so already, try a little simple education with your teams and then run an experiment for a while. You can always stop if it’s not working for you.



A quick update – Janne Sinivirta pointed out that “none of the benefits seem to be from estimates, rather about task breakdown and understanding the tasks.”

He’s got a good point. This is a key thing for me about task estimation. It highlights quickly what you do & don’t understand. The value is at least partially in estimating, not estimates. (Much like the act of planning vs following a plan). Although by adding the estimates to tasks on the wall we could quickly see patterns in flow of tasks that were less clear before and act sooner.

As we move from tactical to strategic work I expect we’ll still need those numbers to help inform how much of our time we need to spend on reactive work. (In most teams I’ve worked in it’s historically about 20% but it’s looking like much more than that here so far).

Martin Burns also highlighted that understanding and breaking down tasks is where much of the work lies. The equivalent of that in this team is in recognising what needs investigation and discussion with users and what doesn’t and adding tasks for those items.

Suboptimising Around Data

Reading time ~3 minutes

I originally wrote this article back in about 2009/2010 whilst working at a large corporation with a very strong measurement culture.

just some dataAs more teams and companies are adopting lean concepts and with the strong influence of the Lean Startup movement (which to reduce confusion is not the same as lean), this post feels relevant to finally publish publicly…

“Data is of course important in manufacturing, but I place the greatest emphasis on facts.” – Taiichi Ohno

There’s a great lean principle known as “Genchi Genbutsu” – “actual place, actual thing” Generally we interpret this as: “Go and See at the source”

When this critical pillar of lean thinking is eroded by a proxy we open ourselves to some painful problems.

Where organizations become too measurement focused, we risk our proxy for Genchi Genbutsu becoming data.

Lean and agile processes both rely on data but these are indicators only. Particularly in agile, there is a strong emphasis on data being used by the team for internal diagnostics – in fact very little agile material talks about data support for external management and there are good reasons for this.

Even where managers are fully aware that data are not the whole story, external or imposed measurement drives strange behaviors.

Talk to any individual or team that is measured on something that managers at one time or another thought was “reasonable” and chances are there will be a range of emotions from bemusement or cynicism to fear. All these responses will drive negative alterations in behavior. Occasionally there are good measures but they’re pretty rare

You’ll find individuals and teams that are working in the “expected” way will be absolutely fine. But their behavior will now be constrained by the metrics and their capacity to improve will be limited.

Those that are aware of their constraints (and are often limited in what they can actually influence to solve their problems) will at best sub-optimize around their own goals and at worst “game” the system in order to preserve themselves. This is a natural self-preservation response.

I’ve seen the most extreme example of this using game theory in simulated product management (in a session run by Rich Mironov back in 2009).

A team of 4 product managers were each given a personal $1M sales target, a set of delivery resources and a product. Performance against their target was measured on a quarterly basis. In the example, the game was deliberately rigged. It was impossible for all product managers to meet their personal targets with the limited resources they were given. However It was possible to achieve well above the total $4m target if product managers collaborated and in some cases were actually willing to sacrifice their own products, releases and resources in order to fund cycles on better-performing products.

Data may also only tell you how tools are being used. If a team is constantly inspecting and adapting, I would expect their tool usage to change. It may then not reflect expectations or worse, they may not be able to adapt for fear of damaging “the numbers”.

Here’s a great example of this from (of all places) a hair Salon…

For a closer to home example try this experiment…

If you’re not already doing so, start measuring story cycle time (from commencement of work to acceptance and ideally delivery).

Now try the following:

  1. Measure cycle times without giving feedback to the team. What are you observing in the data? What can you infer from this? What can’t you see?
  2. Continue measuring but start reporting the numbers to the team and discussing observations. Ask the team what they’re observing, what they can infer and what can’t they see. What would they change?
  3. Ask your teams if they’re willing to report their data to “management”. What responses do you get?
  4. If the teams are willing. Start reporting the data to management. Ask them what they’re observing and what they can infer? What Can’t they see? What would they change?
  5. Consider the level of trust in your organization. How would the experiment above change behavior if trust was significantly higher or lower than its current level?

Food for thought.

Stopping The Line

Reading time ~4 minutes

A few weeks ago the Company I work for celebrated its 13th birthday. As part of the celebrations we were each given the latest copy of the “BoRG” – the company book.  On reading through the pages I found one of the teams I’m responsible for received an award.

This isn’t quite as positive as it sounds and I’m pretty sure the incident will become lgendary within the company. The lessons learned, what we did afterward and the forward thinking attitude of our senior management are however truly worth celebrating.

The (rough) Story

Over the summer one of our teams was working on some updates to our product deployment tools (we deploy upwards of 50 new releases across our product portfolio every month). Part of the automated process involves uploading a packaged installer for our software to a download location and updating our web site to point to the update.

Due to a mix-up between environments and configurations, one of our internal tests made it into the outside world. The problem was spotted and resolved fast but something was clearly wrong for this to have been possible.

This alone would have been rather embarrassing however this was about the 6 or 7th significant incident that had come from our operations teams in as many weeks. We’d recently restructured team ownership of parts of the codebase, were making a large number of significant infrastructure, library, test and build changes to our systems – mostly legacy code (code without sufficient tests). Moreover, we added a whole new team onto the codebase with a very different remit in terms of approach and pace, the volume of churn in the code had massively increased.

This was all during the height of holiday and conference season so many of us weren’t fully aware of the inner carnage that had been occurring. Handing over from one manager to the next repeatedly meant we’d not seen the bigger picture.

I returned from Agile 2012 to a couple of mails from my boss (who was now on holiday) to fill me in on what had happened with the (paraphrased) words: “Can I leave this safe in your hands”.

Over the first 2 hours of my return I was briefed by managers and team members on the situation. Everything was sorted, no problems were in the wild but the team’s credibility had taken a beating.

I’d seen similar things happen in other companies and had always been certain of the right course of action. This was the first time I actually felt safe to lead what I knew was right.

I donned my “Lean” hat and started my nemawashi campaign with our senior managers.

I spoke to each manager individually – they were already well aware of the problems which made things much easier. I simply said.

“These problems can’t continue, we’re going to ‘stop the line’. All projects are going to stop until we’re confident that we can progress again safely.”

I went a step further and set expectations on timescales.

We’d be stopping development work for nearly 20 staff for at least a week. We’d monitor progress daily, and approval to continue would be on the condition that we were confident problems would not reoccur.

By lunchtime I had unanimous support. It was described as a “brave” thing to do by our CEO but all agreed it was right.

A side-benefit of Lean is the shared language it provides. In every case when I approached our management team and explained that I wanted to “stop the line” they immediately understood what I meant plus the impact, value and message behind such an action.

Now of course you can’t prevent new problems with hindsight but you can identify patterns of failure and address these.  In our case I had a good understanding of what had been going on.

Initially I was strongly against performing a full root-cause analysis.  There were half a dozen independent incidents and a strong chance of finger-pointing if we’d gone through these. I was already “pretty sure” where our problems lay. The increased pace had led to a fall in technical discipline coupled with an increased pressure to deliver faster and a lack of sufficient safety net (insufficient smoke tests).

I divided the group into 3 teams to focus on 3 areas.

  • “before release” – technical practices
  • “at the point of release” – smoke tests
  • “after release” – system monitoring

With an initial briefing and idea workshop I stepped back and left the 3 teams to deliver.

The technical practices team developed a team “technical charter”.  We brought all participants together for a review, revised and then published this. Individuals have since signed up to follow this charter and we review it regularly to ensure it’s working.

The smoke testing team developed a battery of smoke tests for the most critical customer-facing areas (shopping cart, downloads etc). These are live and running daily.

The monitoring team developed a digital dashboard (that I can still see from my desk every day). This shows the status of the last run of smoke tests (and history), build status, system performance metrics and a series of alerts for key business metrics that would indicate a potential problem with the site – e.g. a tail-off in volume of downloads or invoices.

They also implemented some server-side status monitoring and alerts that we subscribe to via email.

Since these have been in place we *have* had a couple more incidents but in every case we’ve spotted and resolved it early.

Subsequently a couple of the teams have self-selected to perform a root-cause analysis on a couple of issues. This is exactly the behaviour I love to see, it’s not a management push, they simply wanted to ensure we’d pinned things down and done the right thing. Moreover, they published the results to the whole company.

The award…