Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

fix those damn sporadically false positive usage tests #342

Closed
wants to merge 1 commit into from

Conversation

Dieterbe
Copy link
Contributor

after lots of experimentation I figured out the mock clock sometimes
simply doesn't properly trigger, so that in Usage.Report() sometimes
nothing is received on the tick channel, despite advancing the fake
clock by more than strictly nessecary (i tried with an extra ms),
despite calling runtime.Goshed() ourselves, and despite sleeping
20 ms with the real clock.

The author of the clock package confirms that due to the way the
runtime schedules goroutines, there's no way around the fake clock
sometimes not working. See
https://gophers.slack.com/archives/general/p1462238960008162

Furthermore, in discussion with the golang developers at
golang/go#8869 it becomes clear that it's
unlikely that we'll have a fakeable clock anytime soon.

Ben Johnson (clock author) suggests in the above mentioned gophers
thread that we could mock out the tick function and pass in a different
function in tests. However, that changes so much of the time logic
that it becomes pointless to do any time-based testing in this design.

We could also switch to simply test the basics, not time based.
Since the timing code is pretty easy.

However before we go that route, I wanted to try working with the real
clock. Basically run the usage reporting in real time, but scaled down
to millisecond level instead of second level, to make it finish fairly
quickly.

So now some semantics are changing a bit:

  • we allow up to ms for the usage report to be in the state we
    need it
  • so we now works with steps, which don't happen at exact predictable
    timestamps, rather they have to happen within a timeframe
  • checking timestamp would have gotten more complicated, so I just
    removed it. It's easy to reason that if the updates come within the
    alotted times, then the timestamps should also be set correctly.
  • there's no serious need to explicitly pass around interval settings
    anymore, we just use 1 everywhere.

If it turns out that this approach also triggers false positives
(for example due to circleCI machines being maxed out of CPU and the
reporting unable to happen within the needed time) then we can address
as needed and still switch to the simpler approach.
But that seems very unlikely. This should work.

after lots of experimentation I figured out the mock clock sometimes
simply doesn't properly trigger, so that in Usage.Report() sometimes
nothing is received on the tick channel, despite advancing the fake
clock by more than strictly nessecary (i tried with an extra ms),
despite calling runtime.Goshed() ourselves, and despite sleeping
20 ms with the real clock.

The author of the clock package confirms that due to the way the
runtime schedules goroutines, there's no way around the fake clock
sometimes not working. See
https://gophers.slack.com/archives/general/p1462238960008162

Furthermore, in discussion with the golang developers at
golang/go#8869 it becomes clear that it's
unlikely that we'll have a fakeable clock anytime soon.

Ben Johnson (clock author) suggests in the above mentioned gophers
thread that we could mock out the tick function and pass in a different
function in tests.  However, that changes so much of the time logic
that it becomes pointless to do any time-based testing in this design.

We could also switch to simply test the basics, not time based.
Since the timing code is pretty easy.

However before we go that route, I wanted to try working with the real
clock.  Basically run the usage reporting in real time, but scaled down
to millisecond level instead of second level, to make it finish fairly
quickly.

So now some semantics are changing a bit:
* we allow up to <period> ms for the usage report to be in the state we
need it
* so we now works with steps, which don't happen at exact predictable
  timestamps, rather they have to happen within a timeframe
* checking timestamp would have gotten more complicated, so I just
removed it.  It's easy to reason that if the updates come within the
alotted times, then the timestamps should also be set correctly.
* there's no serious need to explicitly pass around interval settings
  anymore, we just use 1 everywhere.

If it turns out that this approach also triggers false positives
(for example due to circleCI machines being maxed out of CPU and the
reporting unable to happen within the needed time) then we can address
as needed and still switch to the simpler approach.
But that seems very unlikely.  This should work.
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 13, 2016

for the record I just ran i=1; while echo $i && go test; do i=$((i+1)); sleep 0.${RANDOM}s; done until 310 and then canceled it. with master it would always fail before reaching 150, often before 20.

@woodsaj
Copy link
Member

woodsaj commented Oct 13, 2016

I think you are going about testing this the wrong way. I would

  1. move all of the inline functions to actual functions
  2. create unit tests for the custom ticker.
  3. split "Report()" into a "Run()" and an "Report()" function. Run() just runs the ticker calling "report()" on every tick until it is stopped.
  4. In your unit tests you then no longer need to worry about timing. You can just create a new "usage" instance, "u", and immediately stop it (ending the "Run" goroutine). Then call u.Add() to add with as many values you want and call u.Report() to push the usage metrics into aggmetrics where you can then read and validate them.

@woodsaj woodsaj assigned Dieterbe and unassigned woodsaj Oct 14, 2016
Dieterbe added a commit that referenced this pull request Jan 2, 2017
Dieterbe added a commit that referenced this pull request Jan 2, 2017
Dieterbe added a commit that referenced this pull request Jan 3, 2017
Dieterbe added a commit that referenced this pull request Jan 3, 2017
@Dieterbe
Copy link
Contributor Author

#666

@Dieterbe Dieterbe closed this Sep 30, 2017
@Dieterbe Dieterbe deleted the fix-usage-tests branch January 2, 2018 16:01
@imiric imiric mentioned this pull request Jul 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants