In part2 of this series I concluded that automating the
detection of intermittent, random or non-deterministic tests (aka Sporadics)
comes with unforeseeable extra costs. It may serve as a monitoring of Sporadics and might yield information to rank the Sporadics to decide which should be solved first. But if the efforts stop there then one has basically invested in managing status quo
instead of changing it for the better.
Before
proposing a possible solution to the Sporadics problem I need to elaborate more
on how a test becomes a Sporadic. Understanding this will lend hints to the
solution to be established.
The
basic assumption is that all tests have been successful when they were first
added to their test suites in the first place. Otherwise talking about
Sporadics would not be the topic to talk about …
Suppose
there is a test which ran successful for a fair amount of time. Weeks or even
months. Everything has been fine until the day came as it turned red for the
very first time. A test failure occured. Nothing special. What happened back then? Someone might have investigated the
issue. After some thorough work one came to one or more of these conclusions:
- The test failed because of a bug in the test, the bug got fixed
- The test failed because of a bug in the productive code, the bug got fixed
- The test ran successful again when the complete test run has been repeated
- The test ran successful when repeated in separation, so there supposedly was no issue with the test or the code under test itself
- The test failure does not relate to the change made to the productive code at all. Strange, but well (imagine a shrug, when reading)
- There has been some filer outage and the test wasn’t able to read or write a file or there has been any other issue with some part of the infrastructure the test was using
- You name it
(I will
not go into detail about inherent fragile tests some of which could be a great
source of Sporadics. Nor will I elaborate on possible root causes that would
make a test intermittent. This is not the point in this post. I will save this
for later ones.)
We are
not talking about the first to items in this list. These are the good cases
where the safety net worked and the required actions have been taken.
We are
not talking about the occurrences when a real root cause analysis has been
made, the problem has been found and fixed. These things happen probably more
often than not, especially if finding 5 was not accompanied by finding 3 or 4,
but not every time as the number of Sporadics in a corpus of tests will tell
you.
Item
number 6 would be an easy catch: Infrastructure issue. The guys maintaining it
fixed the issue or the issue has been a temporary one. However the test has
been good all the time. No-one has to do anything about it. Sure it will be
green again when the infrastructure issue does not reappear.
Items 3
and 4 tend to be soothing enough for the guy investigating this issue that no
other actions followed. Looks good now. So, just merge into mainline. Must have
been some hiccup sort of.
Item
number 5 consumed the most time investigating. It looks strange, but a rerun
standalone and as part of its test suite succeeded. Somehow it leaves a bad
taste in the mouth. But hey, didn’t it succeed all the time? And now it does
again. Let’s leave it alone and do some important work, what d’you think?
While
item 6 is a bad signal itself, items 3, 4, 5 are the ones that could break the
neck of the company. If we are lucky they “only” signal issues with the tests
themselves. Some hidden dependency that appears in some strange situations just
on days of certain signs of the zodiac while the moon is in a particular position
… you’ve got the idea. If we are not that lucky they were only representing the
tip of the iceberg and there has been some non-deterministic behavior in our
productive code which might lead to loss of data or any other hazardous event
when in production use at a customer’s site. Maybe you only have a hard time
analyzing it working long hours and at weekends or you might be facing a PR
disaster or even worse a substantial claim for damages.
Experience
shows that test failures like items 3 through 5 are the more easily shrugged
off the more often they occur. These failures somehow are not taken as serious
as they should be. Quite often there is an argumentation including pure
statistics or issue tracker records for the code under test showing that it is
not worth the time one would have to spend to find the root cause or if the
root cause is known to fix it. Me personally, I witnessed such a line of
thought more than once. However, tests that failed this way for a first time
will start to fail a second, a third time and over again. And there you are.
Now you have a Sporadic. A known issue. An item on a list or a record in a
database. And one by one they
creep in.
Why
could this happen? In an ideal world a developer would follow the Continuous
Integration principle and thus would be eager to get rid of any test failure in
mainline builds at any time. For we are not living in such an ideal world
things are a bit different. There are test failures that won't be fixed for
reasons listet above. It would be too easy to blame developers for not
caring.
Developers
find themselves confronted with various, concurrent and sometimes even
contradicting requirements. Features always come first for the company gets
paid for it. Maintenance is important too. Not to forget quality. Or some
refactorings. Issues with sporadic failing tests for features delivered long
ago (several weeks or months) tend to get lost in this. They just don’t reach
the level of urgency they would require to get the necessary attention.
In part1 I was complaining about the attitude developers show towards the Sporadics.
This attitude gets influenced by the whole environment developers find
themselves in. So just introducing yet another tool will not help. There is
also the social aspect of this. Or as +Steve Sether pointed out in his comment on my
second post of this series:
"Since
the problem is essentially a social problem, I think we should look towards
social science for guidance. It's been experimentally verified that
people discount any potential badness that happens in the future. So, bad
thing in the future is much better (in people’s minds) than bad thing in the
near present."
So
let's explore the possible solution next time around. The teasing will have an
end then ...
Read
also:
The
opinions expressed in this blog are my own views and not those of SAP