A quarantine has been set up with these rules:
- any regression would be a possible reason to close mainline
- if the regression turns out to be
- trivial then mainline will be closed until the fix arrives and proves to remove the regression
- non-trivial and a fix would take some time and it is
- a reproducible regression in a third party component then it will be quarantined for two weeks and mainline remains open
- a reproducible regression in the application itself then it will be quarantined for two days and mainline remains open
- a sporadic regression and
- it showed up the first time then it will be observed as a known regression - no fix will be requested and mainline remains open
- it showed up the second time then it will close mainline and an analysis will be requested, if analysis shows the fix would be
- trivial then mainline remains closed until the fix arrived and proved to remove the regression
- non-trivial then it will be quarantined and mainline will be opened again
- a quarantined regression that gets not fixed until its time limit has been exceeded will close mainline
The original idea would work if certain assumptions would be true:
- "... quarantine would serve as a to-do list of open issues to be worked on with highest priority"
- quarantine rules will be strictly applied
"If there is a culture where these things are always overruled by something seemingly more important then a quarantine will just be another issue tracker with loads of issues never to be handled or like the to-do lists on dozens of desks which never will be worked on to the very end. It's just another container of things one should try to do when there is some time left."And exactly this is what happened. And it made me think. Either the concept of quarantine was a dead end or the way it has been incorporated was plain wrong. What are my observations?
First, the set of rules is non-trivial. It takes some time to understand which possible cases there are and whether or not a regression would close mainline, will be quarantined or just observed. This rule set is hard to remember for the people to enforce it as well as for the people that would need to fix regressions.
Second, if you consider a green mainline the goal to achieve and if you accept the fact that red pipeline runs require a thorough analysis of what went wrong then the design of this quarantine has its flaws. It would allow for red pipeline runs to enter mainline in the case of a Sporadic occurring the first time. Any deterministic regression gets handled by closing mainline or at least putting it under quarantine while non-deterministic regressions could enter mainline.
Third, there has been no tool to put regressions in quarantine and to report on them with every pipeline run in order to make sure newly failing tests show up easily. Instead all test failures were present and required analysis. Many pipeline runs were analyzed that only contained quarantined test failures causing a lot of wasted work.
Forth, the rule set wasn't enforced completely. Regressions that reached their maximum time in quarantine did not cause mainline to be closed. Instead they started littering pipeline runs and along with the missing quarantine reporting it became even harder to find newly failing tests thus regressions a single change introduced.
Obviously this quarantine has been a failure both in concept and in execution. There needs to be another configuration of it to make it work. There needs to be something very simple and straight so that everyone could memorize it and that a tool could enforce it. And the latter would be the basic shift in my mind. Last time I was talking about not having an automatism but to add a human factor:
"Instead of having an uncompromisable automatism which would just close mainline until a proven fix arrived a human would be able to weigh different aspects against each other and to decide not to close mainline even if a Sporadic popped up"And this was my fifth observation. It seems that at least in an environment where Sporadics are being shrugged of and features are rated higher then bug or regression fixes this human factor would invite people to discuss the rules and to ask for exceptions from them to just be able to do this and that very quickly. A human would be tempted to grant these exceptions and in special cases does grant them. As experience shows in many cases not sticking to the rules turns out to be the cause of a lot of trouble. In the case at hand what was supposed to be a shortcut, a speedup turned out to be a mess and a slowdown of things. Additional regressions were introduced.
With that experience in mind I try to remember what quarantine was all about. Which goals were expected to be achieved?
First of all the baseline of it all is the idea of continuous delivery where release candidates get produced all the time. If a certain pipeline run fails, this run does not yield a release candidate. Continuous delivery allows for frequent releases of small increments of the application thus getting fast feedback on any new features instead of waiting for yearly or half-yearly releases. The increment to be shipped has to be free of regressions. Otherwise it wouldn't be a release candidate for the pipeline to produce it would have been stopped when the error popped up.
So, if a quarantine gets introduced it must make sure
- the software to be shipped is free of regressions and
- the time between two release candidates is as short as possible to allow for small increments
So, allowed time in quarantine needs to be short.
Another flaw in the above quarantine rule set has been the fact that there would be a loophole for regressions to enter mainline. The fact that Sporadics need to show up twice before they would cause a close of mainline and with that open up the possibility to quarantine the regression needs to be dealt with. I'd propose to quarantine any regression no matter whether they are deterministic or not. There would be two days to get them fixed. If the fix does not arrive in due time the regression will leave quarantine and mainline will be closed. No test protocols need to be scanned for known not yet quarantined issues. The quarantine tool would report quarantined regressions properly and any new test failure would show up directly. While mainline is closed no other changes could be pushed to it. This would be an optimistic approach for it is based on the assumption that regressions get fixed with highest priority.
The new quarantine rule set would look like this:
- any regression will be quarantined for up to two days
- if the fix arrives within this time and it proved to be valid then the failing test will leave quarantine
- if the fix does not arrive then the failing test leaves quarantine and causes the mainline to be closed
- if the fix does arrive and it proved to be valid mainline will be opened again
This rule set is simple enough that a tool could implement it, thus removing the human factor from the decision making whether or not to close mainline. The human factor should be used to decide upon the proper limits for regressions to stay in quarantine for this seems to be dependent from the current situation an application development finds it self in. So in the end the CI server could take care of the regressions by enforcing a quarantine although I wasn't fond of this idea earlier. But it would do so based on a simple rule set which does not require a huge and sophisticated tool and the pipeline runs still would be reported as failures and no release candidate will be produced as long as there are quarantined regressions.
I will report whether this approach works in reality.
Read also:
Part 1: Sporadics don't matter
Part 2: Let the CI server take care of these Sporadics
Part 3: Where do these Sporadics come from
Part 4: Tune up the speakers - Let Sporadics scream at you
Part 5: Process vs. People - Or: Processes won't fix you bugs
Part 6: To Quarantine or Not To Quarantine Contagious Tests
The opinions expressed in this blog are my own views and not those of SAP
No comments:
Post a Comment