-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't broadcast Checkable#next_check updates made just not to check twice #10093
base: master
Are you sure you want to change the base?
Don't broadcast Checkable#next_check updates made just not to check twice #10093
Conversation
|
No, I'm not! I haven't even checked all of them in detail, but aren't they just like the other superfluous ones from |
I don't think so. icinga2/lib/icinga/checkable-check.cpp Lines 392 to 404 in bca1a84
icinga2/lib/icinga/checkable-check.cpp Lines 406 to 420 in bca1a84
Especially these two re-schedule the next check of other checkables as something happened with the current one. The other HA node may be responsible for them, so of course this matters for the cluster. But IMAO this doesn't matter for our backends. I think we can stop our anti-SetNextCheck witch-hunt here. |
We literally trigger an |
…wice The checker sorts Checkables by next_check while picking the next due one, so we (already) have to advance next_check while starting a check. But the second master doesn't need this info, as it's not responsible.
In addition, CheckerComponent::NextCheckChangedHandler needs these two (not suppressed!) events too, so that these next_check updates are effective at all. |
Honestly, I don't understand why you're turning down all the suggestions, just to stick to the first idea that came to your mind. Let's be real, no one said this should work out right away with either suggestions. However, compared to adding yet another useless cluster event, this could be a much better solution. We already have enough problems with the countless/unpredictable RPC messages to deal with. So, why add yet another one when there's a better alternative? I'm not saying these |
Actually I'm totally fine with both (#10082 (comment)) a new event or a flag in the existing one. Also, I just said (#10093 (comment)) these events are needed locally. For the latter we could add flag(s) to setters and event handlers, so that the latter can say: Oh, this event is not for broadcasting, so I won't send it as cluster message. Ok? |
0a14846
to
6533a50
Compare
How does this relate to that issue now? If this PR was merged as-is, how would you address that bug? |
/* This calls SetNextCheck() which updates the CheckerComponent's idle/pending | ||
/* This calls SetNextCheck() for a later update of the CheckerComponent's idle/pending | ||
* queues and ensures that checks are not fired multiple times. ProcessCheckResult() | ||
* is called too late. See #6421. | ||
*/ | ||
UpdateNextCheck(); | ||
UpdateNextCheck(nullptr, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a later update of the CheckerComponent's idle/pending
When will this later update happen?
ProcessCheckResult() is called too late.
In particular, is it earlier than this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a later update of the CheckerComponent's idle/pending
When will this later update happen?
- First, CheckerComponent::ExecuteCheckHelper calls checkable->ExecuteCheck (https://github.com/Icinga/icinga2/blob/v2.14.2/lib/checker/checkercomponent.cpp#L233)
- That calls UpdateNextCheck (https://github.com/Icinga/icinga2/blob/v2.14.2/lib/icinga/checkable-check.cpp#L563)
- Finally, CheckerComponent::ExecuteCheckHelper updates the index (https://github.com/Icinga/icinga2/blob/v2.14.2/lib/checker/checkercomponent.cpp#L266)
ProcessCheckResult() is called too late.
In particular, is it earlier than this?
Imagine, you have a simple Python plugin doing some basic network I/O. I know that Python/C++ meme is a bit silly, but actually, compared to how quick CheckerComponent returns to its loop which gets the next item from this index, your Python plugin (and ProcessCheckResult) takes centuries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine, you have a simple Python plugin doing some basic network I/O. I know that Python/C++ meme is a bit silly, but actually, compared to how quick CheckerComponent returns to its loop which gets the next item from this index, your Python plugin (and ProcessCheckResult) takes centuries.
I don't get what this is trying to say but it sounds like you're describing a race condition. Are you trying to say it's not a problem because check plugins will be slow enough? But that sounds like the opposite of the comment, the slower the plugin, the later ProcessCheckResult()
will be called. On the other hand, you can also get quickly failing checks by specifying a non-existent path for example so that executing it fails immediately. That shouldn't break Icinga 2 either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no race condition. Checkable::ExecuteCheck calls first UpdateNextCheck, then GetCheckCommand()->Execute. The plugin can't fail earlier than UpdateNextCheck is called.
But, what I've written about: Because plugins are in general rather slow, ProcessCheckResult() comes with a latency. But the checker index needs SetNextCheck now(!!), that's why UpdateNextCheck is called. It's that simple IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Did I get you (Overdue state doesn't honor set time periods #10082 (comment)) right, that this PR is not (part of) the solution for that issue? If yes, let's remove this PR from v2.14.3.
The checker sorts Checkables by next_check while picking the next due one, so we (already) have to advance next_check while starting a check. But the second master doesn't need this info, as it's not responsible.
refs #10082
TODO
next_check
for remotely generatedcr
#10011