On Story Points and Distributions

Share
Reading time ~6 minutes

If you’ve been reading my posts for a long while, you might remember this curve in relation to fixing bugs. Today I’m resurrecting it for other reasons.right-skewed beta distribution First of all I’ll admit I used to be a bit of an estimation geek. I loved the subject. (really!)

Last summer I attended my first ever #noEstimates session at Agile 2014. What I heard made complete sense to me and there are a few teams here that have stopped estimating (they still measure and forecast throughput) but I still see value in trying to estimate. I’ve seen enough examples over the last decade where work is traded out because a team cannot estimate the effort of the previous.

I love Arlo Belshee’s concept of Naked Planning – in that if you remove all estimation information and focus purely on what the most valuable and important thing is, you’ll do the most valuable and important thing.

The trouble is, in many commercial product situations, it’s not actually that clear what the most valuable thing really is – and value generally has a trade-off of cost. (Or at least “how much am I willing to spend on this”).

I also like Chris Matt’s Options thinking alternative to estimates – “How much are you willing to spend to get a better understanding of this”. This ties in with a fundamental part of estimation theory in that there is a trade-off (and decreasing value) in spending additional time understanding a problem in order to better estimate the outcome.

Finally, whilst many teams I’ve worked with have used story point estimates for stories and hour estimates for tasks, we don’t go in blind. We have triangulation and reference points from prior work wherever possible.

I used to run half-day training courses on software estimation (and still see that course as valuable – primarily because the theory is transferrable). These days, if I have a team that have never had a real conversation about what estimates are, why do them and how I trim down that half-day content into about 15-30 minutes on the most important bits for story and task estimation.

I’ve been using the same explanations on estimation for about 7 years now and although some of my assertions on why we estimate are finally wavering, the information on how is still useful – and – as far as I know, nobody else has explained it this way.

Back in 2011 I wrote an article on Swimlane Sizing that was subsequently referenced and made popular by Alexey Krivitsky in his Scrum Simulation With Lego Bricks paper.

What I want to share today is how all this hangs together based on a very simple concept from the 1950’s.

The PERT technique came from the Polaris missile project and in my very simple terms is essentially a collection of tools and techniques based around probability distribution.

When examining the probability of completing any task there’s a great rule of thumb to start with:

There’s a limit to how well things can go but no limit to how bad they could get!

With this thinking in mind, essentially the completion time and/or effort for a given task can be represented by a probability distribution. A right-skewed beta distribution.

Furthermore, when adding a series of these together, you have a collection of averages. (cue a #noEstimates discussion).

If you ask someone to estimate how long a bug will take to fix, without historic data you’ll get a “how long is a piece of string” type answer. Everyone remembers the big statistical outliers but if you strip these away, you can forecast quite accurately with about 95% confidence. (This is one of the foundations behind using data for service level agreements in Kanban).

Some items will take longer than average, some will be faster but based on those averages, you can get a “good enough” idea on durations.

With PERT, this beta distribution is simplified further to 3 points – “optimistic”, “most likely” and “pessimistic”. The math is simple but at least for today’s point, not important.

Here’s the bits I care about.

PERT summary

Now here’s a neat thing (and the point of this post).

With the introduction of story points, we’ve moved away from the amazing power of  3-point range estimation back to single points.

Once you’re in the realm of single point estimates, people start seeing a falsely implied precision. Unless you’ve actually had training on use of story points (many senior managers probably won’t have done) then you’ll start building all those same human inferences that used to occur with estimates like “It’ll take about 2 weeks”.

When we hear “2 weeks”, we leap to a precise assumption and start making commitments based on that. In range estimation we’d say It’ll take 1 to 4 weeks and in PERT we’d say, “Optimistically it’ll take a week, most likely 2 and pessimistically, 4. Therefore we’ll plan on 2 but offer a contingency of up to a further 2 weeks.”

(Of course in reality, wishful hearing means you might still end up with a 1 or 2 week commitment but hopefully the theory is making sense).

 

It’s also worth examining the size of the gaps between optimistic -> most likely and most-likely -> pessimistic (in particular, the latter of these 2). These offer a powerful window into the relative levels of risk and uncertainty.

By moving back to single point estimates – at least at the single story level, this starts feeling a lot like precise and accurate commitments**. Our knee-jerk reaction may then be to avoid providing estimates again.

But here’s the missing link…

Every story point estimate is in fact a REPRESENTATION of a range estimate!

We can take this thinking even further…

The greater a story point estimate, the less precise it is.

Obvious right? Let’s take one more step…

 If a story point estimate is a representation of a range then a larger number implies a WIDER range.

Let’s take that back to the swimlane sizing diagram and (crudely) overlay some beta distributions…

swimlane-distributions

Look carefully at how those ranges fall.

  • Some 2 point stories take as long as a 3 point story.
  • Some 5 point stories may be the size of 3 point stories, in rare cases, others may end up being 13 or even 20 points.
  • And some 13 point stories may go off the scale.

And that’s all entirely acceptable!

On average a 5-point story will take X amount of effort.

That’s enough to forecast and that’s enough to start building commitments and service level agreements around (should you need to).

So…

Time to start thinking about what distributions your story point estimates are.

If you’re getting wild variation, how might you capture some useful (but lightweight) data to help you, your team and your management understand and improve?

You might be in a fortunate place where you simply don’t need to produce estimates at all. That’s great. I’d assert there’s a lot to learn by simply trying to estimate even if you don’t use the results for anything but for the majority of us that need to do at least some sane forecasting, this thinking might just make estimation a bit safer and a bit more scientific again.

If you’re interested in more on this, take a look at the human side of estimation in “Seeing the Value in Task Estimates”.

**As an aside, a while ago, the SCRUM guide was updated and replaced “commitment” with “forecast”. That’s a big change and tricky to retrain into those that saw SCRUM as the answer to their predictability problems. (Many managers needed guarantees!) For those of you facing continuing problems here, it’s worth gathering and reviewing story data so you can build service level agreements with known levels of confidence as an alternative.

Cracking Big Rocks

Share
Reading time ~3 minutes

Bramber castle (side perspective view)

Things have been rather busy over the last couple of months. I’ve joined a new division at my current company and have the basis for another dozen articles slowly developing but that’s just the start. I have some really exciting news!

After over a year of development with the exceptionally talented Johanna Hunt (@joh) through field testing, workshops, paper prototypes, conferences, conversations, and peer reviews I’m very pleased to announce the launch of crackingbigrocks.com.

The Concept

In trying to solve problems of our own, we found challenges that we now identify as “Big Rock” problems – those things that when faced alone cause us sleepless nights and illogical stress. The mental load associated with Big Rock problems can be so taxing that every time we go to tackle them we find ourselves procrastinating and avoiding or exhausted through trying. A deep breath, a step back, a few pointers, a second brain and some support can help us get back on track.

We’ve faced these problems in both our personal lives and professional careers. For example the series of articles I wrote on “The Oubliette” describes the multiple strategies I used for reducing a major defect backlog.

(I’ve spent the last 2 weeks reusing the oubliette strategies along with a few new ones to help my new team get our quality back under control and keep the motivation to improve high.)

 

Simple Patterns & Coaching Cards

Between us we’ve taken the “Simple Patterns” concept and our combined experiences  of solving large, difficult problems and developed a set of over 50 simple problem solving patterns. In particular, we’ve captured the essence of big mental hurdles and how these can be overcome in ways that have resonated with almost everyone we’ve shared with. (Of the 150 or so people who’ve explored the concepts so far, only two didn’t find they had “Big Rock” problems of their own.)

We’ve produced a limited first edition set (only 50 decks) of  high quality coaching cards containing 45 patterns.  (Within a week of the box being delivered, with no marketing at all we’re already down to just 35 decks left!)

As of about June 2013, the first edition has sold out however the expanded second edition is now available on amazon.

It turns out the ideas and concepts we’ve captured and the formats we’ve chosen are significantly more popular than we expected. Back in August 2012 after a weekend of hand-trimming and cutting (causing a few injuries and RSI) we produced nearly 40 paper prototype decks with unedited wording and far fewer patterns. We gave all of these away and left a few attendees disappointed after running a hugely successful workshop to a packed-out room of nearly 80 attendees at Agile 2012. A month later we had to produce a few extras for a re-run at Agile Cambridge (and took the opportunity for some edits whilst we were at it).

We delivered our first bulk-order for nearly 200 decks for attendees at Agile Cambridge in 2013 and ran another packed-out session. Our good friend Olaf regularly takes a few sets out with him to Play4Agile in Germany.

Even whilst developing and trimming the original prototype decks, we actually used the patterns within the decks to “keep rolling” – the idea of having to manually cut out and collate over 1500 paper playing cards is quite a “Big Rock” in itself. We measured our throughput rate, adjusted our cutting process, optimised our flow and working practices, banked our results in suitable sized batches and made sure our pace was (mostly) sustainable!

Here’s a quick preview of what’s in the box…

Photo of cracking big rocks cards

The majority of patterns in the deck are unique to us but there’s a couple that are better known (such as the “Rubber Duck” shown above). What’s unique here is the format, approach, style and most intriguingly, the community we’re aiming to build. We want to share the ideas and experiences everyone has using these cards as coaching and problem-solving tools.

If you’d like to find out more, head on over to crackingbigrocks.com and take a look.

OK. Marketing over – I generally find selling or marketing what we do a little crass so I hope you as readers don’t mind too much!

Escaping the Oubliette (Part 4) – The Litter Patrol

Share
Reading time ~2 minutes

As promised in my last installment on oubliettes…

Your team might not be fully ready for the merciless refactoring encouraged by some agile approaches but this will help you stay heading in the right direction whilst keeping the delivery vs refactoring impacts balanced out.

The cost of change has a (roughly) logarithmic relationship to debt. I’ve seen first-hand how high-debt systems become almost impossible to change and it’s not pretty.

In a debt-ridden system we are eventually faced with a choice; refactor or replace. Eventually even once-newly-replaced systems build up debt and the refactor/replace choice returns. Craig Larman & Bas Vodde’s most recent book covers the debt/cost relationship brilliantly in the section on “legacy code”. They also describe the oubliette strategy of “Do No Harm” or as I call it – The Litter Patrol“.

This is a particularly powerful debt management approach as it’s both a prevention and reduction strategy.

Here’s the basic concept…

When working with an area of legacy code, you’re working in a particular “neighbourhood“. If that neighbourhood is untidy your care and attention to that neighbourhood is diminished. Much like the “broken windows” principle; once the damage seal is broken, neglect and further damage follow and overall code quality deteriorates rapidly.

So (without going overboard), every time you’re working in a particular neighbourhood, what can you do to clean up a few pieces of litter?

Not the run-down shopping district between lines 904 and 2897 but more the abandoned classic car between lines 857 and 892 or the overflowing trashcan on the corner between 780 and 804.

If you introduce a litter patrol in your teams and encourage a hygiene cycle with every change to your code, your debt load and future cost of change will rapidly reduce for the areas you hit most frequently.

Unfortunately although this is easier than a complete refactoring of a poorly designed class-hierarchy or monolithic god-class, in order to perform the litter patrol in safety on your code you need good unit tests or small functional tests for that neighbourhood and ideally some refactoring support (I like to call these your gloves, garbage bag and litter picker).

This challenge doesn’t mean don’t do it. In fact if you don’t have tests already, maybe your next patrol isn’t changing the code at all but to write just a couple of small, independent tests to demonstrate how that area is expected to behave. (You might even need to make some tweaks to make it testable)

If it’s hard, don’t avoid the problem, focus on making life easier one bite or constraint at a time.

Every time we successfully deliver a small clean-up task, the future cost of change to that area is reduced and our incentive to keep it clean is improved.

Look out for part 5 – sponsorship – coming soon.

Priority Fatigue

Share
Reading time ~2 minutes

This came to me at 5am after a bout of insomnia…

In the last few years the concept of technical debt has really taken root. Teams discuss it and use it to ensure important leftovers get cleared, not just business-critical priorities.

Here’s a fresh verbal anchor…

“Priority Fatigue” – The wear that sets in if you do nothing but focus on the priorities of your products or leaders all the time.

If you’re using Scrum or Kanban; chances are you’re working through some kind of prioritized backlog of work. Most Scrum practitioners are aware that despite iterations being called sprints the team are actually running a marathon. Every now and again your team needs to take a breather and at the end of a release they need proper recovery time.

Clearing technical debt is a common way of recovering. Another approach used by forward-thinking organizations is to have periodic innovation days or weeks where everyone “downs tools” and does something interesting instead. Good team-building days or activities are a third option.

These are all ways of addressing priority fatigue on a team.

Weekends and holidays are the personal slack that we use to pay off some of our individual priority fatigue however many of us don’t actually rest any more.

Our lives are so full we don’t have time to recover. In fact, many people now continue (at least partially) working even whilst on holiday – it’s frequently expected these days.

In the same way we relieve priority fatigue for teams, consider taking time to step back reward ourselves as individuals in innovative ways. If nothing else; take some regular time out to do something interesting even if it’s not important. **

**Caveat: don’t overdo it! Strike a balance with your priorities.

Patterns For Collective Code Ownership

Share
Reading time ~7 minutes

Following the somewhat schizophrenic challenges of dealing with person issues on collective code ownership, here we focus on the practices and practical aspects. How do we technically achieve collective ownership within teams?

We’re using the “simple pattern” approach again. For each pattern we have a suitable “anchor name”, a brief description and nothing more. This should be plenty to get moving  but feel free to expand on these and provide feedback.

Here’s the basic set of patterns to consider:

Code Caretaker

Let’s make it a bit less personal, encourage the team to understand what the caretaker role for code entails. It’s not a single person’s code – in fact the company paid us to write it for them. Code does still need occasional care and feeding – Enter the “Caretaker”.  Be wary though that caretakers may become owners.

Apprenticeship

Have your expert spend time teaching & explaining. An apprentice learns by doing – the old-fashioned way under the tutelage of a master craftsman guiding them full-time. This is time and effort-intensive for the pair involved but (unless you have a personality or performance problem) is a sure-fire way to get the knowledge shared.

If you need to expand to multiple team members in parallel,  skip forward to feature lead & tour of duty.

Ping-Pong Pairing

Try pair programming with an expert and novice together. Develop tests, then code, swap places. One person writes tests, the other codes. Depending on the confidence of the learner, they may focus more on writing the tests and understanding the code rather than coding solutions. Unequal pairing is quite tricky so monitor this carefully, your expert will need support in how to share, teach & coach.

Bug Squishing

Best with large backlogs of low-severity defects to start with. Cluster them into functional areas. Set a time-box to “learn by fixing” – delivery is not measured on volume fixed but by level of understanding (demonstrated by a high first-time pass rate on peer reviews). If someone needs help they learn to ask (or try, then ask) rather than hand over half-cut. This approach works with individuals having peer review support and successfully scales to multiple individuals operating on different areas in parallel.

A Third Pair of Eyes (Secondary Peer Review)

With either an “initiate” (new learner) or an experienced developer working on the coding, have both a learner and an expert participate in peer reviews on changes coming through – to learn and provide a safety net respectively. The newer developer should be encouraged to provide input and ask questions but has another expert as “backstop” for anything they might miss.

Sightseeing / Guided Tour

Run a guided walk-through for the team – typically a short round-table session. Tours are either led by the current SME (subject matter expert) or by a new learner (initiate). In the case of a new learner, the SME may wish to review their understanding first before encouraging them to lead a walk-through themselves.

Often an expert may fail to identify and share pieces of important but implicit information (tacit knowledge) whilst someone new to the code may spot these and emphasize different items or ask questions that an expert would not. Consider having your initiate lead a session with the expert as a backstop.

Feature Leads

The feature lead approach is one of my favourites. I’ve successfully acted as a feature lead on many areas with teams of all levels of experience where I needed to ramp up a large group on a functional area together.

By introducing a larger team, your expert will not have the capacity to remain hands-on and support the team at the same time. They must lead the team in a hands-off approach by providing review and subject/domain expertise only. This also addresses the single point of failure to “new” single point of failure risk seen with “apprenticeship”.

This is also a potential approach when an expert or prior owner steps in too frequently to undo or overwrite rather than coach other members activities. By ensuring they don’t have the bandwidth to get too involved you may be able to encourage some backing-off but use this approach with care in these situations to avoid personal flare-ups.

Cold Turkey

For extreme cases where “you can’t possibly survive” without a single point of failure, try cold turkey. Force yourself to work without them.

I read an article some years ago, (unfortunately I can’t track it down now) where the author explained how when he joined a project as manager, the leaders told him he must have “Dave” on the project as nobody else knew what Dave did. He spent a week of sleepless nights trying to figure out how to get Dave off the project. After removing Dave from the project, the team were forced to learn the bottleneck area themselves and delivered successfully.

(Apologies to any “Dave”s I know – this isn’t about any of you)

Business Rule Extraction

Get your initiate to study the code and write a short document, wiki article or blog defining the business rules in the order or hierarchy they’re hit. This is usually reserved for absolute new starters to learn the basics of an area and show they’ve understood it without damaging anything. It is however also a good precursor to a test retrofit.

Test Retrofit

A step up from business rule extraction whilst delivering some real value to the team. Encourage your learner to write a series of small functional tests for a specific functional area by prodding it, working out how to talk to it, what it does and what we believe it should do. Save the passing tests and get the results checked, peer reviewed and approved.

Chances are you might find some bugs too – decide whether to fix them now or later depending on your safety margin with your initiate.

Debt Payment

After a successful test retrofit, you can start refactoring and unit testing. As refactoring is not without risk this is generally an approach for a more experienced developer but offers a sound way of prizing out functionality and learning key areas in small parts.

This is also a natural point to start fixing any bugs found during a test retrofit.

National Service (also known as Jury Duty)

Have a couple of staff on short rotation (2-4 weeks) covering support & maintenance even on areas they don’t know. This is generally reserved for experienced team members who can assimilate new areas quickly but a great “leveller” in cross-training your team in hot areas.

Risks are generally around decision-making, customer responsiveness, turnaround time and overall lowering of performance but these are all short-term. I’ve often used this approach on mature teams to cross-train into small knowledge gaps or newly acquired legacy functional areas.

My preferred approach is to stagger allocation so that you have one member on “primary” support whilst another provides “backup” each sprint. For the member that covered primary in sprint one, they’re on backup in sprint 2. (expand this to meet your required capacity).

3-6 months of a national service approach should be enough to cover the most critical functional gaps on your team based on customer/user demand

Tour of Duty

Have your staff work on longer term rotations (1-6 months) working on a functional area or feature as part of a feature team. This couples up with the “feature lead” approach.

Again, this is an approach that is very useful for mature agile teams that understand and support working as feature teams but can be introduced on less-experienced teams without too much pain.

The tour of duty can also work beyond an individual role. For example a developer taking a 3-6 month rotation through support or testing will provide greater empathy and understanding of tester and user needs than simply rotating through feature delivery. (My time spent providing customer support permanently changed my attitude toward cosmetic defects as a developer)

Adjacency

This takes a lot more planning and management than the other approaches described but is the most comprehensive.

From the areas each team member knows; identify where functional or technical adjacent areas lay and then use a subset of the approaches described here to build up skills in that adjacency.

Develop a knowledge growth plan for every team member sharing and leveraging their growth across areas with each other. Although this is an increase in coordination and planning, the value here is that your teams get to see the big picture on their growth and have clear direction.

All these patterns have a selection of merits and pitfalls. some won’t work in your situation and some may be more successful than others. There are undoubtedly more that could be applied.

Starting in your next sprint, try some of what’s provided here to developing shared ownership and knowledge transfer for your teams. Pick one or more patterns, figure out the impacts and give them a try.

(Coming soon “Building a Case for Collective Code Ownership” – when solving the technical side isn’t enough)