Reproducible Heisenbug
Matt had just wrapped up work on a demo program for an IDE his company had been selling for the past few years. It was something many customers had requested, believing the documentation wasn't illustrative enough. Matt's program would exhibit the IDE's capabilities and also provide sample code to help others get started on their own creations.
It was now time for the testers to do their thing with the demo app. Following the QA team's instructions, Matt changed the Debug parameter in the configuration file from 4 (full debugging) to 1 (no debugging). Build and deploy completed without a hitch. Matt sent off the WAR file, feeling good about his programming aptitude and life in general.
And then his desk phone rang. The caller ID revealed it was Ibrahim, one of the testers down in QA.
Already? Matt wondered. With a mix of confusion and annoyance, he picked up the phone, assuming it was something PEBKAC-related.
"I've got no descriptors for the checkboxes on the main page," Ibrahim told him. "And the page after that has been built all skew-whiff."
"Huh?" Matt frowned. "Everything works fine on my side."
What could be different about Ibrahim's setup? The first thing Matt thought of was that he'd disabled debugging before building the WAR file for QA.
That can't be it! But it was easy enough to test.
"Hang on one sec here." Matt muted his phone, then changed the Debug parameter on his local deployment from 4 to 1. Indeed, upon refreshing, the user interface went wonky, just as Ibrahim had described. Unfortunately, with debugging off, Matt couldn't check the logs for a clue as to what was going wrong.
Back on the phone, Matt explained how he was able to do reproduce the problem, then instructed Ibrahim on manually hacking the WAR file to change the Debug parameter. Ibrahim reported that with full debugging enabled, the program worked perfectly on his end.
"OK. Lemme see what I can do," Matt said, trying not to sound as hopeless as he felt.
With absolutely no hints to guide him, Matt spent hours stepping through his code to figure out what was going wrong. At long last, he isolated a misbehaving repeat-until loop. When the Debug parameter was set to 4, the program exited the loop and returned data as expected. But when Debug was set to anything less than 4, it made an extra increment of the loop counter, leading to the graphical mayhem experienced earlier.
Horror crept down Matt's spine. This problem would affect anyone using repeat-until loops in conjunction with the IDE. Such programs were bound to fail in unexpected ways. He immediately issued a bug report, suggesting this needed to be addressed urgently.
Later that day, he received an email from one of the IDE developers. I found where it was testing the wrong boolean. Should we raise this as a defect?
"Yes! Duh!" Matt grumbled out loud, then took to typing. And can we find out where this bug crept in? All projects released since that time are compromised!!
As it turned out, the bug had been introduced to the IDE 2 years earlier. It'd been found almost immediately and fixed. Unfortunately, it'd only been fixed in one specific branch within source control-a branch that had never been merged to the trunk.
[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!