Bugs (and how to talk about them)

This is potentially the greatest thing I’ve seen all week (partially because I’m currently facing a Mandelbug).  How a team handles bugs is a strong signal about its overall cohesiveness and quality, and for any product person, there is a lot of personality management that goes into bugfixing.

Technically, a bug is a mistake.  But what people forget is that it isn’t always a programming mistake; product people are equally at fault if the issue is that they didn’t push the product far enough or specify edge cases.  So dealing with fixing bugs is an art: you have to get it done quickly and effectively (which means finding our where the problem is) but avoiding letting blame casting get in the way of actual fixing.

My product person approaches:

“A bohrbug (named after the Bohr atom model) is a bug that manifests itself consistently under a well-defined (but possibly unknown) set of conditions.”

So one of two things happened here.  One, you documented the behavior you wanted and the engineer didn’t make it happen, in which case you nicely say “Hey, this doesn’t seem to be behaving to spec, it should blankedy blank blank.”  Two, there exists an edge case (or regular case) for which you didn’t define behavior so the engineer made something up (or did nothing at all), in which case you should immediately apologize and get back to speccing.  “I never even thought about that possibility.  Sorry about that, yo; let me go noodle on how to fix it.”

“A mandelbug (named after fractal innovator Benoît Mandelbrot) is a computer bug whose causes are so complex that its behavior appears chaotic or even non-deterministic. This word also implies that the speaker thinks it is a bohrbug rather than a heisenbug.”

These bugs are the most likely to provoke an argument.  Take, for example, my week.  Two people at client can reliability reproduce the bug (and take screenshots of it) but since I’m not there, I can’t actually see the error.  And neither I nor the engineers, using same OS and browser and version, can make the damn thing break.  It is tempting to tell the client that they are insane or doing something wrong, since the code itself is static: in theory, what it does for me, it should do for them unless they are fundamentally misbehaving.  The key product challenge here?  Accept that other people are probably not as crazy as they appear.  You have to accept that the bug exists, though you can still choose whether or not to fix it.  Which you may very well choose not to, especially if it only seems to occur for one in every million people.  “I totally believe that you’re seeing it, I just can’t reproduce it, which means I can’t fix it.  I’m going to try something on this end…did that fix it?  Nope?  OK, maybe this…still no?  Alright, since it seems to happen only to you, until we get back some other reports so we can figure out where it is coming from, we’re going to have to just let this one go.”

“A heisenbug (named after the Heisenberg uncertainty principle) is a computer bug that disappears or alters its characteristics when an attempt is made to study it.”

It only sort of counts, but I often see these happen between QA and Prod, where theoretically nothing is different and yet the bug seems to appear only on Prod.  You clearly can’t test the code on Prod, so you need to test it on QA, but it doesn’t break on QA, so…you quietly go insane in a corner.  Again, rather like a mandelbug, the product person’s role here is to keep everyone calm, act rational, and not let the inherent insanity of the bug infect people.  “Hmm, OK.  Let’s not worry too much about why things are operating differently here, and just try to deploy a fix to QA  As long as it doesn’t wreck QA, we’ll deploy to prod and see if that fixes it.  Preferably at 3am, so we can revert if this gets worse.”

“A schrödinbug is a bug that manifests only after someone reading source code or using the program in an unusual way notices that it never should have worked in the first place, at which point the program promptly stops working for everybody until fixed.”

This bug also frequently makes people feel insane.  “The link isn’t turning blue.”  “It wasn’t meant to turn blue.”  “I feel like it used to turn blue, I swear.”  “Well…do you want me to make it turn blue?”  My best advice?  Just decide what you actually want it to do and make it do that; let history be history.

“The phase of the moon bug is sometimes spouted as a silly parameter on which a bug might depend, such as when exasperated after trying to isolate the true cause.”

This is more of a bug game then an actual bug.  If you have, for example, a mandelbug, it is often amusing to try to figure out what could possibly be different between two environments to cause the issue.  I’m blaming mine on the fact that they are in Kansas City and therefore covered in BBQ sauce most the time.

“The term alpha particle bug derives from the historical phenomenon of soft errors caused by cosmic rays.”

We’ve sort of reclaimed this one in my experience to mean “any bug that happens only a couple of times, then mysteriously fixes itself”.  Best advice?  Let sleeping dogs lie.  There is a temptation to rip in there and try to figure out why you saw it those few times, but really, shouldn’t you be building something else?

an N of 1: in statistics, a sample size of 1 has almost no validity. in life, this is less true.