Black Holes & Revelations

Reading time ~ 3 minutes

Have you ever had to deal with a black hole on your team?

“As predicted by general relativity, the presence of a large mass deforms spacetime in such a way that the paths taken by particles bend towards the mass. At the event horizon of a black hole, this deformation becomes so strong that there are no paths that lead away from the black hole” – Wikipedia

I’m not a physicist so here’s a simplified view that I can fit in my smaller brain:

Black holes are like huge “gravity traps” sucking in all energy from the surrounding area. Energy and mass are drawn toward the event horizon, sucked in and lost forever. The more they take in, the larger or denser they get.

Here’s some cool stuff I learned from Karl Schoemer a few years ago.

A team undergoing change can be coarsely divided into 3 behaviors: Design, Default and Defiant/Detractor.

• The “Design” population are your role models; your supporters & change agents – but be aware, some may have short attention spans or become zealots. This is up to 20% of your population.
• Those following the “Default” behavior will sit on the fence; “What.. …ever”, “it doesn’t apply to me”, “I’ll carry on as I am thank you” are all common “default” responses. Typically this is 70% of your population!
• “Defiant/Detractor” behavior exhibits extreme symptoms including shouting, arguments, tantrums, sabotage, threatening to leave and pulling everyone else down with them. Less extreme responses include focusing on the minutiae, public cynicism and endless debate without action. In many cases, whilst this may seem prevalent, often this is actually as little as 10% of your population!

Now let’s return to the Black Hole. In space, black holes are invisible – only their effects can be seen. In change management, we simply fail to recognize and identify them.

Human black holes must be understood and handled with extreme caution.

For those inexperienced with black holes, your instinct will be to try and defuse them. You must spot when you are feeding a metaphorical black hole, rewarding negative behavior by pouring your finite energy and resources in. Feeding black holes provides them additional credibility in front of their peers – their gravity trap grows ever-larger.

Lean values time… Eliminate waste! – Where are you wasting your energy?
If you removed the energy feeding a black hole would it eventually burn out?
In human change, detractors usually either get with the program or leave.

If you’ve read some of my prior articles you’ll know that whilst I appreciate good people; if your behavior and attitude isn’t up to scratch, all the technical prowess in the world is unlikely to make me want you on my team.

Some black holes may be an almost permanent rift in space. Work to minimize their impact and sphere of influence rather than offering more fuel. Consider using them as your “professional cynic” – your sounding board for the detractor response – but be aware this is a lot like playing dodgeball with a burning coal. It’s usually safer to move them away from the powder magazine instead.

Where could your wasted energy be better spent?
Simple! Use it to shift the center of gravity on your team away from the black hole.
Partner with your “design” members as a team and swing your population of defaulters toward your chosen direction. Some may be pulled toward or into the black hole but work on the overall gravity shift to bring the team around.

If you don’t have sufficient design weight to adjust the center of gravity right now, go digging for more – one person at a time if needed. At some point you will be able to tip the balance.

(Oh – a nod to Muse for inspiring the title of this post)

Escaping the Oubliette (Part 1a) – Debt Prevention

Reading time ~ 2 minutes

This is a partial re-post of Escaping the Oubliette (Part 1). I’ve split the article into smaller readable components.

Great, I’ve got my incoming defect strategy nailed,

Now how do I prevent defects and debt in new code?

In 5 words…

Continuous attention to technical excellence.

Here’s my top 7 (there are plenty more)

  1. Acceptance Criteria – Be really disciplined on your acceptance criteria & acceptance tests, team up with Analysts, Testers, Product Owners if you have them and attack your stories from every angle. A good approach to this is a “story kick-off” where the whole team dismantles a story before starting.
  2. Thinking Time – don’t just start coding right away, task things out, try the 10 minute test plan, discuss your approach with your peers and for more complex or large items, try the “just enough design” approach.
  3. TDD – It’s hard to start but has an immense impact.  I’ve just seen a team complete their first project using TDD. 3 weeks into their final round of post feature-complete testing, their defect run-rate hasn’t had the testing spike seen on prior projects. In fact they’re keeping on top of all new incoming defects and have time to start paying down the historic backlog.
  4. Pair Programming – Do it in half-day trial chunks if you don’t have the stomach for going full-tilt. I’ve performed remote pair-programming with colleagues across the Atlantic using decent phone headsets and online collaboration tools for hours at a time. The net result of 2 days of remote pairing was finding and fixing about 10 extra defects in a thousand lines of code that neither of us would have found coding alone.
  5. Peer reviews – there is still a huge space for these in agile teams. But here’s the thing. Be really tough. A peer review is not a hurdle. It’s a shared learning exercise. Functional correctness is actually the smallest component of a peer review. You should trust your developers that far. But there’s a whole series of other aspects to review. See the joy of peer reviews.
  6. Small tasks – I once worked with an outsourced team who when taking work would disappear into a hole for 2 weeks and return with a single task in our configuration management system containing edits to 200+ files and multiple condensed edits to the files. My rule of thumb is one reviewable task per activity. If you’re going to add new functionality and refactor, that’s 2 independent tasks that can be identified and reviewed separately. This means you should be able to easily deliver 2 reviewable, closable tasks per day.
  7. Fast Builds – make it insanely simple for a developer to perform an incremental build that validates new code against the latest main code line. (small tasks are a big help here).  This includes the right subset of unit and functional tests. Aim for a target of a 30 second response time or less between hitting the button and seeing the first results.

In the next article in this series I’ll focus on “Tailing” – How do you start reducing the old defects.

The Joy of Peer Reviews (Part 1 – Code)

Reading time ~ 2 minutes

Pair programming replacing peer reviews is a myth in the same way that “agile projects have no documentation”.

From my experience peer reviews continue to hold a vital place in agile development and software craftsmanship. Unfortunately they are often misunderstood or misapplied.

“Humanizing Peer Reviews” by Karl Wiegers is the best primer I’ve read so far on peer reviews so I’m not going to duplicate Karl’s efforts – I strongly recommend a thorough read. In fact, print a copy and give one to every member of your delivery team to read and discuss.

Like everything else in modern software development, peer reviews are a collaborative team learning experience. Reviewing code properly means both the reviewer and reviewee walk away having learned something and improved their craft.

With code reviews; beyond reviewing for functional correctness, (the simplest, most obvious and potentially quickest part of a review) there’s a selection of considerations I expect reviewers to look for. (There are plenty more).

  1. No code without tests
  2. Good variable naming
  3. Correct use of classes and interfaces
  4. Small methods
  5. Adherence to standards and conventions
  6. Style consistency across the team
  7. Readability
  8. Good test coverage from small tests
  9. Test code follows the same quality standards as production code
  10. Tests describe expected behaviours
  11. All tests pass
  12. Evidence of some test preparation to consider boundary cases, failure modes and exceptions
  13. Clear exception handing, failure modes and explicit boundaries
  14. Sensible error messages
  15. Code smells
  16. Performance risks or tuning opportunities
  17. Security or other “ility” issues.
  18. Opportunities to learn new tricks
  19. New good practices & patterns
  20. Functional correctness

Once you have cultural acceptance on the breadth of technical peer reviews develop your own checklist that everyone supports and remember the goal of a review is to share improvement opportunities, not for lazy coders to have someone else find their bugs for them or for staff to step on each other.

Now…

If you think peer review of code is problematic for your teams, I guarantee that document peer reviews will be a whole lot worse! – see part 2.

Lessons in Application Performance (Part 2)

Reading time ~ 3 minutes

“Captain, we have a customer crisis, get your passport, we need you on the ground tomorrow.”

This would have been fine if it weren’t Thursday night, that I’d been pulling 14 hour days for about a month already and had promised to take my family out at the weekend to make up for it. – Sounds like something out of the goal, doesn’t it!

A couple of years after my first major lesson in application performance, now working for a different company; one of our key customers in South America was trying to get their staff through critical money laundering certifications before a fixed deadline. If this was missed, huge fines were on the cards.

Trouble-was, hardly anyone could get onto the system, it kept going down.

Our implementation team had taken our profile and scaling recommendations, added some contingency and specified the systems for the customer. The customer wasn’t happy with the cost so they’d gone for a cheaper processor and database option which made us uncomfortable but this was a huge deal for us. (It was an early version of SQL Server on Windows 2000 and they were trying to scale to nearly 10,000 users).

I arrived at the customer site. A really great, friendly bunch of people and one of the most bizarre office locations I’d ever been to. The office had a transient community of something like 1,000 staff with no other businesses anywhere nearby. Surrounding it was an entire cluster of lanchonetes that opened at lunchtime – solely to serve this building.

They’d taken the system offline for the weekend and were running their own testing using load runner. Having done scalability tests for customers before this seemed pretty simple. Ramp users up in waves to desired levels, test, ramp up more, test again, tune, retest etc. We recommended a transaction rate of about half a transaction per user per second as a “heavy” load for this type of application.

We reviewed the log files, it was dropping at about 1,000 users – way below where it should have been. I took a dig into their application servers, modified the JVM parameters for optimum heap size, garbage collection etc.  – I’m pretty sure this will work.

Tests showed nothing abnormal. We’re looking good to move forward.

Let’s get the users back on…

10:30 AM Monday morning – The system goes down again.

We take a look at the logs.

Within the space of about a minute, nearly 5,000 users tried to simultaneously log into the system.

Back then, large-scale e-commerce event ticketing systems were not common. For an internal system like this, none of our team ever expected we’d have that kind of load just on login. Even 1,000 closely-packed logins seemed wildly unlikely.

What were they doing?

We talked to the managers about the users. We established that staff in all branches had 2 30 minute breaks per day in 2 patterns (half the staff at a time).

OK, so what…

This was the only time the staff were able to use their systems for this certification. The rest of the time they were serving customers!

These people’s breaks were precious, they had a deadline to meet for mandatory certification this was the only time they could do it.

Solution:  Longer term we fixed the performance of the login routines so it’d never happen again but in order to achieve their short-term goals, the management team staggered break times for staff during the certification period.  (Oh and I’d got the sequencing of one of the JVM hotspot parameters mixed up but that wasn’t so relevant).

Problem solved!

Lessons:

1: When designing systems for performance & scalability (even internal ones), you need a good understanding of peak load based on real usage profiles – including extreme ones. This might seem obvious now but to us back then it wasn’t. We thought we knew how the system was going to be used.

2: Spend a day with the real users, not just the “customer”, Make a point of understanding their goals, and constraints. If you’ve not done so before, I guarantee you’ll learn something unexpected.

3: Sometimes the best solution isn’t technical. In this case, a simple work scheduling change saved a potential fortune to our customer in hardware & licenses.

4: Just as in my first performance lesson; if load is problematic in some areas, build safety margins into the system.

Just Enough Design

Reading time ~ 3 minutes

Some years ago at a prior employer I had the luxury of working with a team delivering a large green-field Java & Oracle project. The requirements were complex and the interfaces, APIs and business logic all needed some pretty exotic thinking to make everything work.

Prior to that project we’d delivered plenty of relatively simple work and been through requirements, design, code, unit test, integration, system test, documentation etc many times. We were generally a “pretty good” team.

We hired a new member – a very experienced and hands-on architect. He brought a whole load of knowledge we were looking for to the team and more…

After being on board about 2 weeks he called a meeting with the entire team. Hauled us into a room and pointed out just how poor we were at proper design. Moreover he took control of the situation, developed a series of design spec templates, guidance and examples, got the team fully ramped up on UML, capturing design decisions, practices, patterns – the works.

Using our new design knowledge and tools, we moved onto the first critical phase of our green-field project in 2 groups.

Group 1 had to get a working proof of concept to the customer in a matter of weeks, group 2 needed to start designing the way more complex second round of features.

For group 1 (a small pilot team of 2), one of the team did about a week’s research, wrote up the basics and hit the ground running (no real design). Group 2 were not allowed to touch a line of code until the designs were complete!

From memory, start to finish; that first phase took about 3 months.

After the initial work was completed, both groups 1 & 2 progressed onto the next round of features based on the design efforts group 2 had completed.

After about 2 weeks we realized we were having to sacrifice one of the team (our feature lead!) almost full-time to maintain the designs. Coding was completed in a total of about 6 weeks. The fastest coding turnaround we’d ever had for something of this scale and the functionality was way harder than the first round of work.

After our crash-course in the pain of full-on software design,  our architect reconvened the team to lead a design practices brainstorming session.

“OK, now you know how to do proper software design; of the tools, practices and documents you used, which do you want to keep and which do you want to ditch?”

Our management seemed to have had the foresight to allow our architect this social experiment knowing full-well that the net result would be a major overall team improvement (the same manager also helped us develop successful business cases for major refactoring efforts – a pretty forward thinking guy).

So what did we keep and what was our philosophy?

Philosophy first:

The greatest value in design after the fact is not in what was implemented but why we chose to do it that way (and why not another way).

The second greatest value in design after the fact is for team members (especially new joiners or maintainers) to get a foothold into the codebase and be able to navigate around safely.

With these cornerstones in mind we kept a few things…

1: High level architecture – a verbal or pictorial summary of the general concept and approach. Often just a photo of some legible whiteboard sketches. – our first foothold

2: Top level flow – a sequence diagram defining the overall flow of responsibility between actors. – our main “ladder”into the codebase.

3: Design decisions and rejections. (in a wiki/threaded discussion) – why did we choose to do things and why did we chose not to do others. – since learning the “why & why not” approach we saved days of ramp-up and maintenance pain on projects.

4: Complex algorithm annotations – for the really gnarly bits. (Avoid this where possible) – draw pictures for these where you can.

5: Public interfaces – as peer and tech-author-reviewed Javadoc post-implementation. I like public interfaces – they’re a great long-term commitment to communicate in a given stable way. In this case they were also a commitment to our customer. Doing a decent job on these saved a world of support pain later.

6: Unit & functional tests – yes, these are design too!

That’s it! – we ditched a whole world of class diagram hell and parameter definitions. We could still sketch out basic class diagrams when needed but not the level of depth needed to generate code from a CASE tool. We ditched all the noise and blurb and we made it clear why the product was written and behaved the way it did.

So – give your teams an easy leg-up into your code and then explain why it does things rather than telling people what it should do.