The Pitfalls of Measuring “Agility”

Share
Reading time ~7 minutes

This post expands on one of the experiences I mentioned in “Rapunzel’s Ivory Tower“.

I presented these lessons and the story at Agile Cambridge back in 2010.  It’s taken nearly 5 years to see the light of day in writing on here. I hope it’s not too late to make a difference.

I and my team hadn’t been in our roles long. We’d been given a challenge. Our executives wanted to know “which teams are agile and which aren’t” (see the Rapunzel post for more). We managed to re-educate them and gain acceptance of a more detailed measurement approach (they we were all Six Sigma certified – these people loved measurement) and I’d been furiously pulling the pieces together so that when we had the time to work face to face we could walk away with something production ready.

Verging on quitting my job I asked James Lewis from Thoughtworks for a pint at The Old Spring. I was building a measurement system that was asking the right questions but there was no way I could see a path through it that would prevent it being used to penalize and criticise hard-working teams. This was a vital assessment for the company. It defined clearly the roadmap we’d set out, took a baseline measure of where we were and allowed us and teams to determine where to focus.

My greatest frustration was that many of the areas teams would score badly were beyond their immediate control – yet I knew senior management would have little time to review anything but the numbers.

James’ question he left me with was:

“How do you make it safe for teams to send a ‘help’ message to management instead?”

I returned to my desk fuelled by a fresh pair of eyes and a pint of cider. I had it!

At the time many agility assessments had two major flaws.

1 – they only have a positive scale – they’re masking binary answers making them look qualitative but they’re not.

2 – They assume they’re right – authoritative, “we wrote the assessment, we know the right answers better than you…”

  • What if the scale went up to 11 (metaphorically) How could teams beat the (measurement) system.
  • And what if 0 wasn’t the lowest you could score. What would that mean?

The assessment was built using a combination of a simpler and smaller agility assessment provided to us by Rally plus the “Scrum Checklist” developed by Henrik Kniberg, the “Nokia Test“, the XP “rules” and my own specific experiences around lightweight design that weren’t captured by any of these. As we learned more, so we adapted the assessment to bake in our new knowledge.  This was 2009/2010, the agile field was moving really fast and we were adding new ideas weekly.

The results were inspired – a 220 question survey covering everything we knew. Radar charts, organizational heat maps, the works.

The final version of the assessment (version 27!) covered 12 categories with an average of about 18 questions to score in each category:

  1. Shared responsibility & accountability
  2. Requirements
  3. Collaboration & communication
  4. Planning, estimation & tracking
  5. Governance & Assurance
  6. Scrum Master
  7. Product Owner
  8. Build and Configuration Management
  9. Testing
  10. Use of tools (in particular Rally)
  11. Stakeholder trust, delivery & commitment
  12. Design

The most valuable part was the scale:

  • -3 We have Major systemic and/or organizational impediments preventing this (beyond the team’s control)
  • -2 We have impediments that require significant effort/resource/time to address before this will be possible (the team needs support to address)
  • -1 We have minor/moderate impediments that we need to resolve before this is possible (within the team’s control)
  • 0 We don’t consider or do this / this doesn’t happen (either deliberate or not)
  • 1 We sometimes achieve this
  • 2 We usually achieve this
  • 3 We always achieve this (*always*)
  • 4 We have a better alternative (provide details in comments)

a radar chart from excel showing each category from the assessment

The assessment was designed as a half-day shared learning experience. For any score less than 3 or 4, we would consider & discuss what should be done and when, what were the priorities, where did the team need support, what could teams drive themselves and what were the impediments. Teams could also highlight any items they disagreed with that should be explored.

Actions were classified as:

  • Important but requires management support / organizational change to achieve
  • Useful, low effort required but requires more change support than low hanging fruit
  • Potential “low hanging fruit”, easy wins, usually a change in practice or communication
  • Important but requires significant sustained effort and support to improve

As a coaching team we completed one entire round of assessments across 14 sites around the globe and many teams then continued to self-assess after the baseline activity.

Our executive team actually did get what they needed – a really clear view on the state of their worldwide agile transformation. It wasn’t what they’d originally asked for but through the journey we’d been able to educate then about the non-binary nature of “being agile”

But the cost, the delays, the iterative approach to developing the assessment, the cultural differences and the sheer scale of work involved weren’t sustainable. An assessment took anything from an hour to two days! We discovered that every question we asked was like a mini lesson in one more subtle aspect of agile.  Fortunately they got quicker after the teams had been through them once.

By the time we’d finished we’d started to see and learn more about the value in Kanban approaches and were applying our prior Lean experience and training rather than simply Scrum & XP + Culture. We’d have to face restructuring the assessment to accommodate even more new knowledge and realized this would never end. Surely that couldn’t be right.

Amongst the lessons from the assessments themselves, the cultural differences were probably my favourite.

  • Teams in the US took the assessment at face-value and good faith and gave an accurate representation of the state of play (I was expecting signs of the “hero” culture to come through but they didn’t materialize).
  • The teams in India were consistently getting higher marks without supporting evidence or outcomes.
  • Teams in England were cynical about the entire thing (the 2-day session was one of the first in England. Every question was turned into a debate).
  • The teams in Scotland consistently marked themselves badly on everything despite being some of our most experienced teams.

In hindsight this is probably a reflection on the level of actual knowledge & experience of each site.

Partway through the baseline assessments after a great conversation with one of the BA team in Cambridge (who sadly for us has since retired) we added another category – “trust”. His point was all the practices in the world were meaningless without mutual trust, reliability and respect.

It seemed obvious to us but for one particular site there was so much toxic politics between the business leadership and development that nobody could safely tackle that he had an entirely valid point. I can’t remember if we were ever brave enough to publish the trust results – somewhat telling perhaps? (Although the root cause “left to pursue other opportunities” in a political power struggle not long before I left).

Despite all this the baselining activities worked and we identified a common issue on almost all teams. Business engagement.

We were implementing Scrum & XP within a stage-gate process. Historically the gate at which work was handed over from the business to development was a one-way trip. Product managers would complete their requirements and them move on to market-facing activities and leave the team to deliver. If a team failed to deliver all their requirements it was historically “development’s fault” that the business’ numbers fell short. We were breaking down that wall and the increased accountability and interaction was loved by some and loathed by others.

We shifted our focus to the team/business relationship and eventually stopped doing the major assessments. We replaced them with a 10 question per-sprint stakeholder survey where every team member could anonymously provide input and the product managers view could be overlaid on a graph. This was simpler, focused and much more locally & immediately actionable. It highlighted disconnects in views and enabled collaborative resolution.

Here’s the 10 question survey.

Using a scale of -5 to +5 indicate how strongly you agree or disagree with each of the following statements  (where -5 is strongly disagree, 0 is neutral and +5 is strongly agree)

Sprint

  • The iteration had clear agreement on what would be delivered
  • The iteration delivered what was agreed
  • Accepted stories met the agreed definition of done
  • What was delivered is usable by the customer
  • I am proud of what was delivered

Release

  • I am confident that the project will successfully meet the release commitments
  • Technical debt is being kept to a minimum or is being reduced

General

  • Impediments that cannot be resolved by the team alone are addressed promptly
  • The team and product manager are working well together

If you’re ever inclined to do an “agile assessment” of any type, get a really good understanding of what questions you’re trying to answer and what problems you’re trying to solve. Try to avoid methodology bias, keep it simple and focused and make sure it’s serving the right people in the right ways.

Oh – if you’re after a copy of the assessment, I’m afraid it’s one of the few things I can’t share. Those that attended the Agile Cambridge workshop have a paper copy (and this was approved by the company I was at at the time) but I don’t have the rights to share the full assessment now I’m no longer there. I also feel quite strongly that this type of assessment can be used for bad things – it’s a dangerous tool in the wrong circumstances.

Thanks – as always – for reading.

Rapunzel’s Ivory Tower

Share
Reading time ~4 minutes

“If the problem exists on the shopfloor then it needs to be understood and solved at the shopfloor” – Wikipedia

Some years ago I took on an agile coaching role at a very large corporation. Like many stereotypical large corporations, they were seen as data-driven, process and documentation-heavy. Management culture was perceived as measurement-focused, command & control and low-trust.

They had a very well established set of Lean practices and managers promoted strong values around empowerment. Despite Lean training for all staff, there was still a very limited “Go See” culture.  Above a certain level it was still traditional management-by-numbers and standardization – mostly by apparent necessity through scale.

James Lewis recognized some of these challenges. (but was perhaps more brutal than my insider view)

At the start of the transformation the leaders wanted to know “who’s agile and who isn’t”.

Disturbing as the thought might seem, their motivation was sound. We’d all put our careers on the line to “go agile” in order to turn around a struggling group. The last thing needed was a disaster project with “Agile” being labelled as the cause of failure.

(Nearly 2 years further down the line, we managed to have at least one project fail early and be recognised as saving over a million dollars).

We developed an extensive “agility assessment” in order to teach all those involved that “being agile” wasn’t a binary question and wasn’t just about Scrum practices.

The measurement system for the assessment acknowledged that whilst there may be “good” answers, there are no “right” answers or “best practices” – teams could actually beat the system. (If there were “one true way” of developing software, the industry would be very dull).

Beyond measurement, the big challenge I and my team faced was the pressure to “operationalize” agile. To develop common standards, procedures, work instructions, measurements and tools worldwide. The Quality Management System (QMS) culture from our manufacturing background meant that interpretation of ISO accreditation needs was incredibly stringent and was required in order to do business with many customers.

Ironically that requirement kept us almost entirely away from the teams delivering software!

Operationalization was what our managers were asking for and it was very difficult to say “no”. Traditional corporate culture defined this as the way things should be.

So from stepping into a role where we expected the gloves to come off, where we could get out of the management bubble and start making a real difference with teams; within a few months my entire team found themselves unwittingly captive in an ivory tower.

We saw it coming and felt powerless to stop it but as permanent employees fresh into our very high-profile roles, those painful home-truths could not be comfortably raised.

I and my team spent that first period doing what was asked of us and helping teams out for the odd few days at a time wherever we could.

Fortunately all was not lost. At the same time, we invested in a highly experienced external group to engage on each of our sites and drive some of the changes we needed to achieve from within the teams.

Was the value I’d hoped to add in my role lost? – Actually no.

The managers got what they wanted – heavily seeded with a our own more balanced agile/lean understanding and experience.

We weren’t perfect but made a significant series of improvements. The teams actually delivering products had far more experienced consultants supporting them, who as contractors could take the right risks that permanent staff could not have done at the time.

This 2-tier approach actually gave the delivery teams more air-cover to find their own way whilst we worked on coaching the management.

The teams still had a long way to go but were heading in the right direction and getting progressively better. At the same time, the management team learned that Agile isn’t simply a case of running 2 days Scrum Master training, developing a set of procedural documentation and expecting that everything will show 1,000% improvement.

After the initial bedding in period, I and my team were able to build up sufficient trust with our leaders that we could set future direction ourselves.  The kick-start needed on change within the teams had already been made. (far more effectively than we could have achieved alone). 

With our leadership trust established, after being holed up in a tower for too long, our coaching team were able to reach the real world again. This time it was entirely within our own control, with the management support we needed and enough credibility remaining with the teams we had interacted with to move forward.

We were free, able to step in, learn more, tune, help out and spend months at a time properly embedded on teams taking them forward – reaching that point of empowerment for our team was a coaching journey in itself.

If you’re in the fortunate position to be an agile coach or in a similar role in a very large or more traditional organization, make sure you recognize that your coaching efforts will often be as much (if not more) necessary in coaching your leaders first.