Thursday, October 23, 2014

Interlude: A practical experience with quarantine

Sometimes one develops nice ideas or puts down some thoughts and then reality kicks in and what has been a nice theory gets under pressure by the events of daily life. Same for me. Recently I was talking about Sporadics and the attitude of shrugging them off. I was talking about people and the need to put them first and processes second. And I was talking about a quarantine to give people a chance to tackle difficult to solve issues without the need to close mainline. I also was talking about my ambivalent feelings towards a quarantine. Just recently these ambivalent feelings proved themselves valid as a quarantine turned into a ever growing bowl of bad apples and regressions entered mainline faster then one would be able to count up to five. What happened?

A quarantine has been set up with these rules:
  • any regression would be a possible reason to close mainline
  • if the regression turns out to be 
    • trivial then mainline will be closed until the fix arrives and proves to remove the regression
    • non-trivial and a fix would take some time and it is
      • a reproducible regression in a third party component then it will be quarantined for two weeks and mainline remains open
      • a reproducible regression in the application itself then it will be quarantined for two days and mainline remains open
    • a sporadic regression and
      • it showed up the first time then it will be observed as a known regression - no fix will be requested and mainline remains open
      • it showed up the second time then it will close mainline and an analysis will be requested, if analysis shows the fix would be
        • trivial then mainline remains closed until the fix arrived and proved to remove the regression
        • non-trivial then it will be quarantined and mainline will be opened again
  • a quarantined regression that gets not fixed until its time limit has been exceeded will close mainline

It turned out to be a hell of a job to enforce this <sarcasm>simple</sarcasm> set of rules and to convince people to consider removal of regressions their primary task and that disabling the test that unveils them wouldn't be the solution if not the test itself turns out to be buggy. That's why I need to touch the topic of quarantine once again.

The original idea would work if certain assumptions would be true:
  • "... quarantine would serve as a to-do list of open issues to be worked on with highest priority"
  • quarantine rules will be strictly applied

If one of them does not hold the whole concept would be broken. I further expressed my concerns about a quarantine:

"If there is a culture where these things are always overruled by something seemingly more important then a quarantine will just be another issue tracker with loads of issues never to be handled or like the to-do lists on dozens of desks which never will be worked on to the very end. It's just another container of things one should try to do when there is some time left."
And exactly this is what happened. And it made me think. Either the concept of quarantine was a dead end or the way it has been incorporated was plain wrong. What are my observations?

First, the set of rules is non-trivial. It takes some time to understand which possible cases there are and whether or not a regression would close mainline, will be quarantined or just observed. This rule set is hard to remember for the people to enforce it as well as for the people that would need to fix regressions.

Second, if you consider a green mainline the goal to achieve and if you accept the fact that red pipeline runs require a thorough analysis of what went wrong then the design of this quarantine has its flaws. It would allow for red pipeline runs to enter mainline in the case of a Sporadic occurring the first time. Any deterministic regression gets handled by closing mainline or at least putting it under quarantine while non-deterministic regressions could enter mainline.

Third, there has been no tool to put regressions in quarantine and to report on them with every pipeline run in order to make sure newly failing tests show up easily. Instead all test failures were present and required analysis. Many pipeline runs were analyzed that only contained quarantined test failures causing a lot of wasted work.

Forth, the rule set wasn't enforced completely. Regressions that reached their maximum time in quarantine did not cause mainline to be closed. Instead they started littering pipeline runs and along with the missing quarantine reporting it became even harder to find newly failing tests thus regressions a single change introduced.

Obviously this quarantine has been a failure both in concept and in execution. There needs to be another configuration of it to make it work. There needs to be something very simple and straight so that everyone could memorize it and that a tool could enforce it. And the latter would be the basic shift in my mind. Last time I was talking about not having an automatism but to add a human factor:
"Instead of having an uncompromisable automatism which would just close mainline until a proven fix arrived a human would be able to weigh different aspects against each other and to decide not to close mainline even if a Sporadic popped up"
And this was my fifth observation. It seems that at least in an environment where Sporadics are being shrugged of and features are rated higher then bug or regression fixes this human factor would invite people to discuss the rules and to ask for exceptions from them to just be able to do this and that very quickly. A human would be tempted to grant these exceptions and in special cases does grant them. As experience shows in many cases not sticking to the rules turns out to be the cause of a lot of trouble. In the case at hand what was supposed to be a shortcut, a speedup turned out to be a mess and a slowdown of things. Additional regressions were introduced.

With that experience in mind I try to remember what quarantine was all about. Which goals were expected to be achieved?

First of all the baseline of it all is the idea of continuous delivery where release candidates get produced all the time. If a certain pipeline run fails, this run does not yield a release candidate. Continuous delivery allows for frequent releases of small increments of the application thus getting fast feedback on any new features instead of waiting for yearly or half-yearly releases. The increment to be shipped has to be free of regressions. Otherwise it wouldn't be a release candidate for the pipeline to produce it would have been stopped when the error popped up.

So, if a quarantine gets introduced it must make sure
  • the software to be shipped is free of regressions and
  • the time between two release candidates is as short as possible to allow for small increments

If a regression gets up to two weeks in quarantine then experience shows that it will stay there for pretty much exactly this time. Considering this it would not be possible to quarantine a regression for two weeks because this would spread the time between two release candidates to at least this amount of time. If within two weeks one more regression of this sort gets quarantined the time between two release candidates would grow even longer. One regression every two weeks would make it impossible to produce a release candidate although one regression every two weeks would be a pretty good rate. What's more, the increment would be quite large. If a quarantine of two weeks is not possible then waiting for fixes of 3rd party components would not be possible any more except this 3rd party component comes with releases in a high frequency. With that the application has to mitigate issues in any 3rd party library by not using the buggy feature at all, which may mean to work around of it or to skip the own feature if it is dependent of this very feature the 3rd party component offers and which turned out to be buggy. If this becomes the default mitigation in the application a quarantine of two weeks wouldn't be needed anymore. Independent of the regression being located in the application or in any 3rd party component the fix has to be made in the application itself. The team developing the application would regain control over the application which in itself would be a nice thing to achieve.

So, allowed time in quarantine needs to be short.

Another flaw in the above quarantine rule set has been the fact that there would be a loophole for regressions to enter mainline. The fact that Sporadics need to show up twice before they would cause a close of mainline and with that open up the possibility to quarantine the regression needs to be dealt with. I'd propose to quarantine any regression no matter whether they are deterministic or not. There would be two days to get them fixed. If the fix does not arrive in due time the regression will leave quarantine and mainline will be closed. No test protocols need to be scanned for known not yet quarantined issues. The quarantine tool would report quarantined regressions properly and any new test failure would show up directly. While mainline is closed no other changes could be pushed to it. This would be an optimistic approach for it is based on the assumption that regressions get fixed with highest priority.

The new quarantine rule set would look like this:
  • any regression will be quarantined for up to two days
  • if the fix arrives within this time and it proved to be valid then the failing test will leave quarantine
  • if the fix does not arrive then the failing test leaves quarantine and causes the mainline to be closed
  • if the fix does arrive and it proved to be valid mainline will be opened again

that's it. Simple and straight forward. No distinction between types of regressions. If required an upper limit for regressions in quarantined could be added. If this has been reached mainline would be closed also. But the rules to open mainline would get a bit more complex for there are two conditions to consider. That's why I would prefer not to have the upper limit of quarantined regressions in the rule set. The two day limit is tight enough to put pressure on the resolution of regressions.

This rule set is simple enough that a tool could implement it, thus removing the human factor from the decision making whether or not to close mainline. The human factor should be used to decide upon the proper limits for regressions to stay in quarantine for this seems to be dependent from the current situation an application development finds it self in. So in the end the CI server could take care of the regressions by enforcing a quarantine although I wasn't fond of this idea earlier. But it would do so based on a simple rule set which does not require a huge and sophisticated tool and the pipeline runs still would be reported as failures and no release candidate will be produced as long as there are quarantined regressions.

I will report whether this approach works in reality.


Read also:

Part 1: Sporadics don't matter
Part 2: Let the CI server take care of these Sporadics
Part 3: Where do these Sporadics come from
Part 4: Tune up the speakers - Let Sporadics scream at you
Part 5: Process vs. People - Or: Processes won't fix you bugs
Part 6: To Quarantine or Not To Quarantine Contagious Tests


The opinions expressed in this blog are my own views and not those of SAP

No comments:

Post a Comment