The Oubliette

Reading time ~ 2 minutes

An oubliette is a particularly unpleasant dungeon characterized by a well-like opening. During medievil times these were used “to forget” (oublier -in French) about “unwanted guests”.

Imagine how you’d feel stuck at the bottom of an oubliette?

It’s probably one of the least inspiring places to be. Expect feelings of despair & hopelessness. You might find yourself asking “What do I do?”, “Why do I bother?”, “It’d be easier to just curl up and let it end”.

I’ve seen development teams end up in similar situations. Untamed defects build up around them over a period of years until it’s too late. The backlog is so deep there’s no hope of getting out. Even committing the entire team to just fixing bugs for over a year won’t save them.

It starts with a brittle codebase. A prior combination of poor architecture, lack of clear standards, lack of debt management and years of too much corner cutting. Often the underlying culprit is repetetive “business pressure” with a team that have not been empowered to say no. This mountain of cut corners and poor decisions offends the sensibilities of all but the most cavalier of developers but once the pit starts to get deep and squishy, what incentive is there to improve?

Your business cannot see delivery of features sacrificed to refactor the codebase, it adds no business value!

The gradual drip drip of quality problems continue as your ability to keep up with demand for new features slowly leaches away, the team slows down further, it costs more and takes longer to develop as the business piles on the pressure for even more new features in less time.

The oubliette spirals towards oblivion and like a prisoner at the bottom of the well, your team and codebase become starved of quality.

Don’t let this happen to you.

(Here’s the first part of how to escape and prevent the oubliette)

Stop Working With Blunt Tools

Reading time ~ 2 minutes

Clarke Ching introduced me to a story of 2 woodcutters – one worked furiously but finished late whilst another stopped frequently to sharpen his tools and finished early.  (He tells it better than I do) – I think it’s actually based on an Abraham Lincoln quote:

“If I had eight hours to chop down a tree, I’d spend six hours sharpening my ax.”

Let’s think more about developing software the Lincoln way

What tool sharpening should you do before you start chopping?

If you paid off or prevented some of the debt you were facing before new work started, would it enable your project to run faster, more smoothly, with reduced risk, a lower chance of defects or a lower maintenance cost?

A previous employer had a great strategy. Every new release of the product had 2 top priority named features on the priority list for “cleaning up” and “levelling up”.

Cleaning up:

  1. Get all unit and regression tests passing (and keep them there).
  2. Address all build failures and warnings (and keep them under control).
  3. Delete all functionality, code and tests that will have been deprecated for more than 3 releases. (and add alerts to functionality that will be removed in the next release)
  4. Fix all defects that put us below releasable quality before we’ve even started (and keep them there).

Levelling up:

  1. Raise the tools we use to those that are best supported, newest in the market or offer improvements to our working conditions.
  2. Raise the libraries we use to the latest supported versions and address any issues.
  3. Raise the platform versions we’ll support when the product ships and address any issues. (and remove support for obsolete or out of date platforms).

No major release started full-tilt on functional work until these were cleared.

Like all good practices, this isn’t new thinking, it correlates to 3 components of the lean 5S strategy; sort (seiri), straighten (seiton) and shine (seiso). Rather than just describing what was done, here’s some tangible benefits for the clean up & level up approach…

  1. There were no unpleasant surprises for our customers on a new release. We had a standard platform, support and deprecation policy and kept to it. Our customers liked it when we did predictable things.
  2. For the development teams levelling up was a common risk. Addressing this at the beginning of a project was a valuable de-risking activity. Where we hit critical problems, we could make clear, early upgrade decisions and where there were fewer issues we would develop full-time on updated versions throughout the project with regression on older supported platforms available from prior development cycles.
  3. Removing old parts of the product made life significantly easier for the teams. The reduced testing, regression and maintenance load allowed us to speed up development – much like scraping barnacles off the hull of a boat to help it run faster. Cleaning up also allowed us to take some sensible baseline code metrics before any new work started.
  4. Giving teams space and time to clean and level up before starting functional work had a positive impact on morale. We felt that we were trusted to “do the right thing” rather than “just ship it”. This empowered us to continue doing the right thing throughout the rest of our work.

Give your teams some time out to sharpen their tools and sort, straighten & shine the workshop before new work starts. It will make a difference to the performance of your team and the quality of the end result.

Dietary Manipulation (Part 1) – Pizza

Reading time ~ 2 minutes

**Note, I’m not responsible for any health issues related to over-eating, poor diet, allergies, existing or hidden conditions or the quality of the pizzas ordered in. Eating pizza during training is a personal choice.

Covering story sizing in ~30 minutes needs creative ways to get through people’s mental road-blocks.

On a pure estimation course attendees are prepared to expand their estimation skills with no “selling” needed however if it’s a small component of a multi-day “agile” course people find it hard to fit in their heads. They need to mentally accept the basic concepts and then learn by doing.

I’ve developed a novel (and not very healthy) technique for getting through the “I can’t fit it in my head” barrier.

At the start of a course; during discussions on logistics, lunch, dietary needs etc. I say to the group:

“OK, this afternoon we’ll be covering story point estimation. This can get a bit contentious for those of you that are mathematically minded…”

“…For lunch today, if you are willing, I suggest you have a large helping of pizza. Once you’re fully loaded with carbs, we’ll start the afternoon session. The estimation stuff is about an hour in – where you’ll be close to your carb peak and won’t feel like fighting any more :)”

Then I make sure there’s just over half a large pizza per person available for lunch.

This sounds bad, but it’s not mandatory and I’ve been completely honest with attendees – I’ve not yet had anyone not comfortable to eat pizza.  **(see opening disclaimer)

I’ve also quietly planted the seed that although it’s tricky and abstract, I don’t want any trouble during that part of the session.

What we’ve done is collaboratively neutralize the barriers that prevent the theory going in so that the teaching is more like osmosis. Once the basics are gently brought in with some slides, chat and rather lethargic Q&A , the teams put the theory immediately into practice on their workshop exercises.

Now they get it – but without the mental trauma!

After another day’s immersion with the points they defined during the workshop exercises marked in the corner of each of their story cards, the seeds have taken hold sufficiently for teams to accept what they’re doing and use again outside the classroom.

Never underestimate the power of food & drink when teaching and/or learning.

Are You Empowered?

Reading time ~ 2 minutes

Many large companies want to promote a culture of empowerment but what does that really mean?

In a small company or start-up you often truly are empowered to act beyond your boundaries.  In fact it goes beyond that, you’re responsible for acting fast.

Chances are if you don’t pick things up that need dealing with, either someone else will and leave you feeling distinctly mediocre or your team or company will suffer. Either way, the culture of empowerment in small companies transforms into shared accountability.

In a large corporation, does this really still work?

Whilst we may think this is a problem with corporate culture, it actually depends most on individual managers.

In a traditional hierarchical organization, telling your management staff that their teams are empowered sounds very noble and supportive but in reality it’s seen more like abdicating support. Pushing empowerment at this level usually means you want something done for free with no risk to yourself.

There’s a difference between staff being told they are empowered and actually being empowered.  In fact, as a senior manager; empowering your staff requires you to make it safe for your teams to act.  One great way to do this is to lead by example.

In a conversation with Dan North early last year, his quip really stuck with me…

“You are anointed with empowerment, go forth and be empowered.”

Here’s what’s often hidden behind the words…

  • There’s an approval process you need to go through beforehand.
  • When you’re done, I want a full report with metrics on my desk and a 1 slide PowerPoint summary for the executive team.
  • Here’s a catalog of things you can’t do or touch and people you can’t speak to.
  • Don’t screw up or it’s your ass on the line.

Let’s break that mindset…

First, take a look at your constraints. What things are you really not allowed to change. Probably nothing – as long as you can demonstrate something better.

Sadly, most of us have a mortgage and/or family to sustain, a career to maintain, are on the line for getting stuff delivered and are way over-stretched.  That’s not a very empowering position.

Truly empowered people are able to take calculated risks and perform valuable actions that they know are the right thing to do, they ask for forgiveness & approval later if needed and most of all, they have their manager’s unflagging support, even when they fail.

As a Manager, don’t abdicate your responsibilities to your teams; give them the tools and safety they need to really be empowered so that they can make a difference and feel supported in doing so.

Freedom To Vent

Reading time ~ 2 minutes

Sometimes you just need to let things out.

When weasel politics come into play (instead of just the usual politics), my timer goes off. I know I should be more moderate but I don’t tolerate it, I choose not to play political games, they benefit nobody.  I see it, recognize it and want to call it out publicly.

My ability to be incredibly hard and direct or just downright 4-letter offensive is legendary (but rare). My colleagues coined the term Bad Captainfor when the Tourettes really kicks in – but that part doesn’t happen in the office.

Sometimes it’s the right approach – I’ve seen public calling-out work wonders on particularly toxic characters but more often it’s a career-limiting and collaboration-damaging thing to do.

If you face an underlying need to flame that doesn’t go away – you know – that urge to write down everything that’s wrong, calling out someone’s personal shortcomings, copying the entire world (and their bosses) and hitting send – remember the following advice…

It’s a bit like peeing in your pants as a kid, it feels warm for a few seconds but gets cold and uncomfortable very fast.

Rather than allowing me to blow a fuse, a former boss and I developed an understanding.

He recognized my need and would allow me to spend that magical halfhour furiously and perfectly crafting the necessary barb-laden email knowing that writing it was critical to my corporate sanity.  He was even willing for me to hit send!

As long as it only went to him.

After throwing my mail-bomb at his inbox, he’d wait for me to grab a cup of tea, take a break and calm down and would then come over…

“I’ve read your mail. There’s some valid points in there. Do you want me to do anything with this or just hit delete?”

That simple response was everything I needed in order to release the pressure, ask for help, get a response and start getting the situation back under control.

If you have staff or team members with a relatively high level of professional passion,  provide them the freedom to vent in a safe environment but support them in learning how to control it themselves and when to pick up the phone.

Lessons in Application Performance (Part 2)

Reading time ~ 3 minutes

“Captain, we have a customer crisis, get your passport, we need you on the ground tomorrow.”

This would have been fine if it weren’t Thursday night, that I’d been pulling 14 hour days for about a month already and had promised to take my family out at the weekend to make up for it. – Sounds like something out of the goal, doesn’t it!

A couple of years after my first major lesson in application performance, now working for a different company; one of our key customers in South America was trying to get their staff through critical money laundering certifications before a fixed deadline. If this was missed, huge fines were on the cards.

Trouble-was, hardly anyone could get onto the system, it kept going down.

Our implementation team had taken our profile and scaling recommendations, added some contingency and specified the systems for the customer. The customer wasn’t happy with the cost so they’d gone for a cheaper processor and database option which made us uncomfortable but this was a huge deal for us. (It was an early version of SQL Server on Windows 2000 and they were trying to scale to nearly 10,000 users).

I arrived at the customer site. A really great, friendly bunch of people and one of the most bizarre office locations I’d ever been to. The office had a transient community of something like 1,000 staff with no other businesses anywhere nearby. Surrounding it was an entire cluster of lanchonetes that opened at lunchtime – solely to serve this building.

They’d taken the system offline for the weekend and were running their own testing using load runner. Having done scalability tests for customers before this seemed pretty simple. Ramp users up in waves to desired levels, test, ramp up more, test again, tune, retest etc. We recommended a transaction rate of about half a transaction per user per second as a “heavy” load for this type of application.

We reviewed the log files, it was dropping at about 1,000 users – way below where it should have been. I took a dig into their application servers, modified the JVM parameters for optimum heap size, garbage collection etc.  – I’m pretty sure this will work.

Tests showed nothing abnormal. We’re looking good to move forward.

Let’s get the users back on…

10:30 AM Monday morning – The system goes down again.

We take a look at the logs.

Within the space of about a minute, nearly 5,000 users tried to simultaneously log into the system.

Back then, large-scale e-commerce event ticketing systems were not common. For an internal system like this, none of our team ever expected we’d have that kind of load just on login. Even 1,000 closely-packed logins seemed wildly unlikely.

What were they doing?

We talked to the managers about the users. We established that staff in all branches had 2 30 minute breaks per day in 2 patterns (half the staff at a time).

OK, so what…

This was the only time the staff were able to use their systems for this certification. The rest of the time they were serving customers!

These people’s breaks were precious, they had a deadline to meet for mandatory certification this was the only time they could do it.

Solution:  Longer term we fixed the performance of the login routines so it’d never happen again but in order to achieve their short-term goals, the management team staggered break times for staff during the certification period.  (Oh and I’d got the sequencing of one of the JVM hotspot parameters mixed up but that wasn’t so relevant).

Problem solved!

Lessons:

1: When designing systems for performance & scalability (even internal ones), you need a good understanding of peak load based on real usage profiles – including extreme ones. This might seem obvious now but to us back then it wasn’t. We thought we knew how the system was going to be used.

2: Spend a day with the real users, not just the “customer”, Make a point of understanding their goals, and constraints. If you’ve not done so before, I guarantee you’ll learn something unexpected.

3: Sometimes the best solution isn’t technical. In this case, a simple work scheduling change saved a potential fortune to our customer in hardware & licenses.

4: Just as in my first performance lesson; if load is problematic in some areas, build safety margins into the system.

Lessons in Application Performance (Part 1)

Reading time ~ 3 minutes

At 5PM one evening about 10 years ago  I had a call from the head of application performance at the company I was working for…

“Captain, I just thought you’d like to know you’re currently responsible for the most expensive piece of SQL in the company – worldwide… “

It was using about 95% of the CPU resource on 2 Superdomes. This was a production showstopper; no going home tonight…

Now this was embarrassing. I and one of my team were responsible for a recent complete refactoring of a key component on one of our global systems. We’d tuned the hell out of this thing. Despite it trawling a decade’s worth of live data – millions of records, it was so efficient that it returned in under half-a second every time with minimum memory footprint, minimum disk I/O and minimum CPU load. We’ve profiled it like crazy under all kinds of data shapes and volumes. We’d taken production transaction volumes, built in contingency and ploughed it through those and it was just screaming away, no problems.  In fact it was one of the best bits of refactoring & performance tuning I and my co-developer had ever done.

Something was seriously wrong. There was no disputing the logs.

As always with these major production issues, I and my development parter pulled an all-nighter on calls to various application groups in the US.

We found an anomaly.

One query was being called about 2,000,000 times a minute!

All our profiling, production comparisons and performance data had been based on a peak load of about 1,000 hits a minute. As a key component of a worldwide order system we knew it would be heavily used and put a reasonable load on the system but this was way beyond our most unrealistic expectations.

All our tuning had been based on a known forecast usage through our standard order management system. The calculation engine was typically called once or twice for every order placed worldwide. (We’d forecast about 500 orders per minute and easily had capacity for twice that volume)

Further investigation revealed the source.

Earlier that week, a new application from another team had been integrated into production.

There were rumors that the VPs of each team didn’t speak to each other. Either way, although we knew members of the other team, they were on the other side of the world and we didn’t collaborate very often.

This new application was dependent on the same calculation engine. We’d spent some time training their developers on how to interface to it and they were really pleased with the results they were seeing. Once our knowledge transfer was done, that was the end of the story as far as we were concerned.

What we didn’t know was that their initial effort were so successful that they had integrated it into their product as an auto-calculation system.

Every time a user tabbed from one field to the next on the order line details, it was performing a recalculation.

Now a typical order at this company contained about 100 order lines. And each order line contained ~20 fields. Our calculation engine was being put under nearly 2,000 times more load than had ever been expected!

Needless to say we had the team fix it fast.

They removed the “auto-price” feature and replaced it with a “recalculate” button on the shopping cart.

Lessons…

1: When you’re writing a system that will be integrated with multiple systems or uncontrolled third-parties, make sure there’s a mandatory part of the interface in place that requires the caller to be clearly identifiable and that your logging is user-friendly. Putting all blame aside, this immediately allows you to identify and isolate unique integration issues.

2: Don’t just train your users/customers how to use your system. Help review their approach and processes and once they’re up & running, get them to show you the results and walk you through what they did. Chances are they will try to do something exotic that you weren’t expecting and would make you squirm. Better to find it early and deal with it than allow it to become a production issue. Remember it’s up to you to be a role model in this collaboration.

3: Identify, set your performance constraints, expectations and limitations up-front. Consider even building them into the system in some configurable way and tell your user/customers what they are. In the days of denial-of-service attacks, secure coding requires us to put throttles into our systems to prevent them running away with resources. (anyone ever tried to buy Glastonbury tickets online?) Even in smaller internal systems, it’s worth having some idea on volume and usage. Performance defects are notoriously difficult to resolve, are usually showstoppers and often require major architectural rework – We were lucky this time!

4: Many defects are found when your software is used in ways that you and your requirements didn’t foresee. Often that flexibility is a major asset or differentiator but if it’s not, consider putting up safety rails to limit your system to intended usage only.  If it “might be useful” to work another way in future, remember YAGNI – You aren’t gonna need it. If you feel you must have that potential flexibility, at least consider putting a lock on it that requires your customer or user to call up or understand and recognize the impact first.

Your Leader Sets The Tone For Your Team

Reading time ~ 2 minutes

Recently I quite openly and permanently expressed my deep frustration with another senior manager. What bugged me was the finger pointing, “Over the Wall” behavior when it was clear there was a mutual screw-up.

My response caused a lot of upset and whilst inflammatory and not entirely justified, it did galvanise the groups into just getting on with things.

I’ll reflect on a quote from a recent predecessor.

“It’s up to us to be the grown-ups here”.

I wasn’t, and I should have been but it was hopefully a one-off. (Maybe it was the supermoon),  perhaps necessary this time – who knows. The fact that it still bothers me says I was probably wrong – I continue to learn from my mistakes…

I have quote from a leadership coach I learned from in a former life.

“A leader sets the tone for their organization”.

Her point was that my behavior goes way beyond a single team! I’ve seen this in every large company I’ve worked with so far. At some point a conflict forms between leaders for an unknown and often political reason. Once that rift is in place it becomes a defining part of the organization’s entire culture. The “us & them” barrier is erected and the rock hurling begins.

Teams downstream see this behaviour and believe it’s socially acceptable. They follow suit and perpetuate the problem.  When one or other problematic personality eventually moves on, do you really think that embedded culture will just naturally unwind itself?

It’s up to you at whatever level you’re at to cross the organizational chasm and drive out that attitude, one phone call, face to face conversation or collaborative relationship at a time. (more email is not the answer!)

Furthermore, we are all responsible to teach our leaders to demonstrate a positive role model to their teams. Call out bad behavior and get the parties to address their conflict. If not for the greater good of the company, at least for the personal and social well-being of the teams.

Work to understand and express the perspectives and motivations on either side of the rift. What’s driving the behaviour, is there any misalignment on priorities and goals? If so, who can help solve them and how? What impact will that alignment have and how soon can we fix it?

Just as with trust, good organizational culture takes years to build and moments to destroy.

The World Needs Ditch Diggers Too

Reading time ~ < 1 minute

I once worked for a company that claimed:

“We only hire the best!”

– Interesting, did you know you shared a science park with Oracle, Google, Sun, GE and Yahoo?

So we have at least half a dozen companies (and probably more like twelve-dozen the world-over) where managers also claimed to “only hire the best”.

Somebody has to be stretching the truth here.

Now admittedly, some of the staff there really were spectacularly good but much like any other large company, most others were just good, some were average, and staff on the ground questioned the continued existence of the occasional one or two hangers on.

But this isn’t all bad news. In fact if you really only hired the best, chances are it’d be like casting a film where the entire cast are Oscar winners and everyone wants best supporting and lead titles. On paper it might look fantastic but I guarantee your production will be a complete nightmare and the end result would be pretentious and expensive!

Here’s an article by Kris Dunn I read a few years ago that really brought home to me how damaging the “only the best” approach can be when 80% of the time you simply need to get stuff done. (apologies if the formatting on the linked article is off).

Nobody Thanks the Drummer

Reading time ~ < 1 minute

A couple of weeks ago I went out to see The Wonder Stuff and The Levellers at a gig in Cambridge.

Yes I do still dwell circa 1992 – You should see my car!

On the way out I recognised The Wonder Stuff’s new drummer – Fuzz Townshend (formerly of Pop Will Eat Itself). This guy has been a key contributor to some of my favourite music for over 20 years so I stepped up to congratulate him on a fantastic gig.

He seemed geninely shocked (and pleased) to be both recognized and thanked for his performance and took the time to discuss his thoughts on the likelihood of a PWEI reformation and on the whereabouts of their unreleased final album of material and was – like most mature rock stars – well worth taking the step to talk to.

It made me realize. Whilst most of the crowd had been stopping to have their photos taken with the guitarist, the recognition for the rhythm section that binds a band together is rarely the same.

On software development teams (and companies in general) this same pattern applies. How often have you seen someone tirelessly working behind the scenes, a complete hidden hero – perhaps only recognized by their own team – get passed by when the awards and recognition are handed out.

Take some time out this week to recognize your rhythm section.