-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Performance improvement for runtime env serialization #48749
base: master
Are you sure you want to change the base?
[core] Performance improvement for runtime env serialization #48749
Conversation
Signed-off-by: dentiny <[email protected]>
python/ray/remote_function.py
Outdated
# runtime env will be merged and re-serialized. | ||
# | ||
# Caveat: for `func.option().remote()`, we have to recalculate serialized | ||
# runtime env info upon every call. But it's acceptable since pre-calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be more clear,
To support dynamic runtime envs in `func.option(runtime_env={...}).remote()`, we recalculate the serialized runtime env info in the `option` call. If there are multiple calls to a same option, one can save the calculation by `opt_f = func.option(runtime_env={...}); [opt_f.remote() for i in range(many)]`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the "If there are multiple calls to a same option" part, since we don't do any caching for option
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adopted other comments.
# only update runtime_env when ".options()" specifies new runtime_env | ||
# Only update runtime_env and re-calculate serialized runtime env info when | ||
# ".options()" specifies new runtime_env. | ||
serialized_runtime_env_info = self._serialized_base_runtime_env_info | ||
if "runtime_env" in task_options: | ||
updated_options["runtime_env"] = parse_runtime_env( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should no longer populate updated_options["runtime_env"] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We basically have two options:
- Keep
updated_options["runtime_env"]
updated, as we did in the past - Remove
runtime_env
fromupdated_options
I choose the first method, becase updated_options
corresponds to task_options
and default_options
, so people have expectation there's runtime_env
in these options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, ok since we need to support bind
as well, where this optimization is not used
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Addresses issue #48591
The problem is:
runtime_env
inremote
decorator, the parsing and serialization happens for each function invocationparse_runtime_env
, which involves a dictionary to class transformationget_runtime_env_info
, which serialize a class into json formatDiscussed with @rynewang , the proposed solution here is cache pre-calculated serialized runtime env info, so the parsing and serialization only happens once at initialization.
Benchmarked with the test @jjyao mentioned on the ticket, I confirm we could reach similar performance between no env var vs with env var.
Alternatives considered:
functools.cache
forget_runtime_env_info
, which is a stateless function