The Pitfalls of Measuring “Agility”

Reading time ~ 6 minutes

This post expands on one of the experiences I mentioned in “Rapunzel’s Ivory Tower“.

I presented these lessons and the story at Agile Cambridge back in 2010.  It’s taken nearly 5 years to see the light of day in writing on here. I hope it’s not too late to make a difference.

I and my team hadn’t been in our roles long. We’d been given a challenge. Our executives wanted to know “which teams are agile and which aren’t” (see the Rapunzel post for more). We managed to re-educate them and gain acceptance of a more detailed measurement approach (they we were all Six Sigma certified – these people loved measurement) and I’d been furiously pulling the pieces together so that when we had the time to work face to face we could walk away with something production ready.

Verging on quitting my job I asked James Lewis from Thoughtworks for a pint at The Old Spring. I was building a measurement system that was asking the right questions but there was no way I could see a path through it that would prevent it being used to penalize and criticise hard-working teams. This was a vital assessment for the company. It defined clearly the roadmap we’d set out, took a baseline measure of where we were and allowed us and teams to determine where to focus.

My greatest frustration was that many of the areas teams would score badly were beyond their immediate control – yet I knew senior management would have little time to review anything but the numbers.

James’ question he left me with was:

“How do you make it safe for teams to send a ‘help’ message to management instead?”

I returned to my desk fuelled by a fresh pair of eyes and a pint of cider. I had it!

At the time many agility assessments had two major flaws.

1 – they only have a positive scale – they’re masking binary answers making them look qualitative but they’re not.

2 – They assume they’re right – authoritative, “we wrote the assessment, we know the right answers better than you…”

  • What if the scale went up to 11 (metaphorically) How could teams beat the (measurement) system.
  • And what if 0 wasn’t the lowest you could score. What would that mean?

The assessment was built using a combination of a simpler and smaller agility assessment provided to us by Rally plus the “Scrum Checklist” developed by Henrik Kniberg, the “Nokia Test“, the XP “rules” and my own specific experiences around lightweight design that weren’t captured by any of these. As we learned more, so we adapted the assessment to bake in our new knowledge.  This was 2009/2010, the agile field was moving really fast and we were adding new ideas weekly.

The results were inspired – a 220 question survey covering everything we knew. Radar charts, organizational heat maps, the works.

The final version of the assessment (version 27!) covered 12 categories with an average of about 18 questions to score in each category:

  1. Shared responsibility & accountability
  2. Requirements
  3. Collaboration & communication
  4. Planning, estimation & tracking
  5. Governance & Assurance
  6. Scrum Master
  7. Product Owner
  8. Build and Configuration Management
  9. Testing
  10. Use of tools (in particular Rally)
  11. Stakeholder trust, delivery & commitment
  12. Design

The most valuable part was the scale:

  • -3 We have Major systemic and/or organizational impediments preventing this (beyond the team’s control)
  • -2 We have impediments that require significant effort/resource/time to address before this will be possible (the team needs support to address)
  • -1 We have minor/moderate impediments that we need to resolve before this is possible (within the team’s control)
  • 0 We don’t consider or do this / this doesn’t happen (either deliberate or not)
  • 1 We sometimes achieve this
  • 2 We usually achieve this
  • 3 We always achieve this (*always*)
  • 4 We have a better alternative (provide details in comments)

The assessment was designed as a half-day shared learning experience. For any score less than 3 or 4, we would consider & discuss what should be done and when, what were the priorities, where did the team need support, what could teams drive themselves and what were the impediments. Teams could also highlight any items they disagreed with that should be explored.

Actions were classified as:

  • Important but requires management support / organizational change to achieve
  • Useful, low effort required but requires more change support than low hanging fruit
  • Potential “low hanging fruit”, easy wins, usually a change in practice or communication
  • Important but requires significant sustained effort and support to improve

As a coaching team we completed one entire round of assessments across 14 sites around the globe and many teams then continued to self-assess after the baseline activity.

Our executive team actually did get what they needed – a really clear view on the state of their worldwide agile transformation. It wasn’t what they’d originally asked for but through the journey we’d been able to educate them about the non-binary nature of “being agile”

But the cost, the delays, the iterative approach to developing the assessment, the cultural differences and the sheer scale of work involved weren’t sustainable. An assessment took anything from an hour to two days! We discovered that every question we asked was like a mini lesson in one more subtle aspect of agile.  Fortunately they got quicker after the teams had been through them once.

By the time we’d finished we’d started to see and learn more about the value in Kanban approaches and were applying our prior Lean experience and training rather than simply Scrum & XP + Culture. We’d have to face restructuring the assessment to accommodate even more new knowledge and realized this would never end. Surely that couldn’t be right.

Amongst the lessons from the assessments themselves, the cultural differences were probably my favourite.

  • Teams in the US took the assessment at face-value and good faith and gave an accurate representation of the state of play (I was expecting signs of the “hero” culture to come through but they didn’t materialize).
  • The teams in India were consistently getting higher marks without supporting evidence or outcomes.
  • Teams in England were cynical about the entire thing (the 2-day session was one of the first in England. Every question was turned into a debate).
  • The teams in Scotland consistently marked themselves badly on everything despite being some of our most experienced teams.

In hindsight this is probably a reflection on the level of actual knowledge & experience of each site.

Partway through the baseline assessments after a great conversation with one of the BA team in Cambridge (who sadly for us has since retired) we added another category – “trust”. His point was all the practices in the world were meaningless without mutual trust, reliability and respect.

It seemed obvious to us but for one particular site there was so much toxic politics between the business leadership and development that nobody could safely tackle that he had an entirely valid point. I can’t remember if we were ever brave enough to publish the trust results – somewhat telling perhaps? (Although the root cause “left to pursue other opportunities” in a political power struggle not long before I left).

Despite all this the baselining activities worked and we identified a common issue on almost all teams. Business engagement.

We were implementing Scrum & XP within a stage-gate process. Historically the gate at which work was handed over from the business to development was a one-way trip. Product managers would complete their requirements and them move on to market-facing activities and leave the team to deliver. If a team failed to deliver all their requirements it was historically “development’s fault” that the business’ numbers fell short. We were breaking down that wall and the increased accountability and interaction was loved by some and loathed by others.

We shifted our focus to the team/business relationship and eventually stopped doing the major assessments. We replaced them with a 10 question per-sprint stakeholder survey where every team member could anonymously provide input and the product managers view could be overlaid on a graph. This was simpler, focused and much more locally & immediately actionable. It highlighted disconnects in views and enabled collaborative resolution.

Here’s the 10 question survey.

Using a scale of -5 to +5 indicate how strongly you agree or disagree with each of the following statements  (where -5 is strongly disagree, 0 is neutral and +5 is strongly agree)

SPRINT

  • The iteration had clear agreement on what would be delivered
  • The iteration delivered what was agreed
  • Accepted stories met the agreed definition of done
  • What was delivered is usable by the customer
  • I am proud of what was delivered

RELEASE

  • I am confident that the project will successfully meet the release commitments
  • Technical debt is being kept to a minimum or is being reduced

GENERAL

  • Impediments that cannot be resolved by the team alone are addressed promptly
  • The team and product manager are working well together

If you’re ever inclined to do an “agile assessment” of any type, get a really good understanding of what questions you’re trying to answer and what problems you’re trying to solve. Try to avoid methodology bias, keep it simple and focused and make sure it’s serving the right people in the right ways.

Oh – if you’re after a copy of the assessment, I’m afraid it’s one of the few things I can’t share. Those that attended the Agile Cambridge workshop have a paper copy (and this was approved by the company I was at at the time) but I don’t have the rights to share the full assessment now I’m no longer there. I also feel quite strongly that this type of assessment can be used for bad things – it’s a dangerous tool in the wrong circumstances.

Thanks – as always – for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *