-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent on_cancellation_job & on_completion_job deserialization failure blocking cleanup #24
base: master
Are you sure you want to change the base?
Conversation
lib/delayed/job_groups/job_group.rb
Outdated
begin | ||
job = on_cancellation_job | ||
job_options = on_cancellation_job_options | ||
rescue StandardError => e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only catch Delayed::DeserializationError
s so we'll crash and retry for other types of errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my scenario, I got an ArgumentError
. Checking out https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/backend/base.rb#L73, it looks like quite a long list of errors that can be generated by YAML.load_dj(handler)
. I'm happy to copy paste all of those errors in here if that's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yuck. I was hoping it was just a single exception to catch. It's not ideal but I think copying that list of errors will avoid mistakenly rescuing some classes of errors.
lib/delayed/job_groups/job_group.rb
Outdated
begin | ||
job = on_completion_job | ||
job_options = on_completion_job_options | ||
rescue StandardError => e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about only catching Delayed::DeserializationError
lib/delayed/job_groups/job_group.rb
Outdated
rescue StandardError => e | ||
Delayed::Worker.logger.info('Failed to deserialize the on_completion_job or on_completion_job_options for ' \ | ||
"job_group_id=#{id}. Skipping on_completion_job to clean up job group.") | ||
error_reporter.call(e) if error_reporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swallowing these errors when a job group is trying to complete seems dangerous since the job group really hasn't completed. Do you think we need to introduce the notion of a job group being in a failed state so we don't crash the worker by continually retrying but still capture the fact that the job group didn't complete. We might need something similar for cancels too since the job group really hasn't been completely canceled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose an alternative here is that the job group could mark itself as blocked instead of destroying itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we still need to manage some state to indicate why a job group was blocked e.g. so you could query the DB for the list of job groups that are blocked due to failure and unblock them after a fix has been deployed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would certainly be more convenient if we did that. I think we could get away with using a single failed_at timestamp on the on_cancellation_job or the on_completion_job that would be set if deserialization failed. Blocking the job group still makes sense right?
context "on_completion_job refers to missing class" do | ||
# The on_completion_job needs the class to be defined this way in order to serialize it | ||
# rubocop:disable RSpec/LeakyConstantDeclaration,Style/ClassAndModuleChildren,Lint/ConstantDefinitionInBlock | ||
module Delayed::JobGroups::JobGroupTestHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define this constant in a before
so we're sure it will be present when the example starts running? The current approach won't work if we ever added multiple examples to this example group.
* Explicit error classes rescued * Test begin block * Add failed_at column to delayed_job_groups
This addresses #22.
This skips enqueueing the on_cancellation_job or on_completion_job if there is an error deserializing it. Adds an optional proc,
error_reporter
, that accepts one argument (the raised error) when the job group encounters a deserialization error.prime: @jturkel