On Story Points and Distributions

Reading time ~ 5 minutes

If you’ve been reading my posts for a long while, you might remember this curve in relation to fixing bugs.right-skewed beta distribution

Today I’m resurrecting it for other reasons. First of all I’ll admit I used to be a bit of an estimation geek. I loved the subject. (really!)

Last summer I attended my first ever #noEstimates session at Agile 2014. What I heard made complete sense to me and there are a few teams here that have stopped estimating (they still measure and forecast throughput) but I still see value in trying to estimate. I’ve seen enough examples over the last decade where work is traded out because a team cannot estimate the effort of the previous.

I love Arlo Belshee’s concept of Naked Planning – in that if you remove all estimation information and focus purely on what the most valuable and important thing is, you’ll do the most valuable and important thing.

The trouble is, in many commercial product situations, it’s not actually that clear what the most valuable thing really is – and value generally has a trade-off of cost. (Or at least “how much am I willing to spend on this”).

I also like Chris Matt’s Options thinking alternative to estimates – “How much are you willing to spend to get a better understanding of this”. This ties in with a fundamental part of estimation theory in that there is a trade-off (and decreasing value) in spending additional time understanding a problem in order to better estimate the outcome.

Finally, whilst many teams I’ve worked with have used story point estimates for stories and hour estimates for tasks, we don’t go in blind. We have triangulation and reference points from prior work wherever possible.

I used to run half-day training courses on software estimation (and still see that course as valuable – primarily because the theory is transferrable). These days, if I have a team that have never had a real conversation about what estimates are, why do them and how I trim down that half-day content into about 15-30 minutes on the most important bits for story and task estimation.

I’ve been using the same explanations on estimation for about 7 years now and although some of my assertions on why we estimate are finally wavering, the information on how is still useful – and – as far as I know, nobody else has explained it this way.

Back in 2011 I wrote an article on Swimlane Sizing that was subsequently referenced and made popular by Alexey Krivitsky in his Scrum Simulation With Lego Bricks paper.

What I want to share today is how all this hangs together based on a very simple concept from the 1950’s.

The PERT technique came from the Polaris missile project and in my very simple terms is essentially a collection of tools and techniques based around probability distribution.

When examining the probability of completing any task there’s a great rule of thumb to start with:

There’s a limit to how well things can go but no limit to how bad they could get!

With this thinking in mind, essentially the completion time and/or effort for a given task can be represented by a probability distribution. A right-skewed beta distribution.

Furthermore, when adding a series of these together, you have a collection of averages. (cue a #noEstimates discussion).

If you ask someone to estimate how long a bug will take to fix, without historic data you’ll get a “how long is a piece of string” type answer. Everyone remembers the big statistical outliers but if you strip these away, you can forecast quite accurately with about 95% confidence. (This is one of the foundations behind using data for service level agreements in Kanban).

Some items will take longer than average, some will be faster but based on those averages, you can get a “good enough” idea on durations.

With PERT, this beta distribution is simplified further to 3 points – “optimistic”, “most likely” and “pessimistic”. The math is simple but at least for today’s point, not important.

Here’s the bits I care about.

PERT summary

Now here’s a neat thing (and the point of this post).

With the introduction of story points, we’ve moved away from the amazing power of  3-point range estimation back to single points.

Once you’re in the realm of single point estimates, people start seeing a falsely implied precision. Unless you’ve actually had training on use of story points (many senior managers probably won’t have done) then you’ll start building all those same human inferences that used to occur with estimates like “It’ll take about 2 weeks”.

When we hear “2 weeks”, we leap to a precise assumption and start making commitments based on that. In range estimation we’d say It’ll take 1 to 4 weeks and in PERT we’d say, “Optimistically it’ll take a week, most likely 2 and pessimistically, 4. Therefore we’ll plan on 2 but offer a contingency of up to a further 2 weeks.”

(Of course in reality, wishful hearing means you might still end up with a 1 or 2 week commitment but hopefully the theory is making sense).

It’s also worth examining the size of the gaps between optimistic -> most likely and most-likely -> pessimistic (in particular, the latter of these 2). These offer a powerful window into the relative levels of risk and uncertainty.

By moving back to single point estimates – at least at the single story level, this starts feeling a lot like precise and accurate commitments**. Our knee-jerk reaction may then be to avoid providing estimates again.

But here’s the missing link…

Every story point estimate is in fact a REPRESENTATION of a range estimate!

We can take this thinking even further…

The greater a story point estimate, the less precise it is.

Obvious right? Let’s take one more step…

 If a story point estimate is a representation of a range then a larger number implies a WIDER range.

Let’s take that back to the swimlane sizing diagram and (crudely) overlay some beta distributions…

swimlane-distributions

Look carefully at how those ranges fall.

  • Some 2 point stories take as long as a 3 point story.
  • Some 5 point stories may be the size of 3 point stories, in rare cases, others may end up being 13 or even 20 points.
  • And some 13 point stories may go off the scale.

And that’s all entirely acceptable!

On average a 5-point story will take X amount of effort.

That’s enough to forecast and that’s enough to start building commitments and service level agreements around (should you need to).

So…

Time to start thinking about what distributions your story point estimates are.

If you’re getting wild variation, how might you capture some useful (but lightweight) data to help you, your team and your management understand and improve?

You might be in a fortunate place where you simply don’t need to produce estimates at all. That’s great. I’d assert there’s a lot to learn by simply trying to estimate even if you don’t use the results for anything but for the majority of us that need to do at least some sane forecasting, this thinking might just make estimation a bit safer and a bit more scientific again.

If you’re interested in more on this, take a look at the human side of estimation in “Seeing the Value in Task Estimates”.

**As an aside, a while ago, the SCRUM guide was updated and replaced “commitment” with “forecast”. That’s a big change and tricky to retrain into those that saw SCRUM as the answer to their predictability problems. (Many managers needed guarantees!) For those of you facing continuing problems here, it’s worth gathering and reviewing story data so you can build service level agreements with known levels of confidence as an alternative.

Leave a Reply

Your email address will not be published. Required fields are marked *