Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Performance improvement for runtime env serialization #48749

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Nov 14, 2024

Addresses issue #48591

The problem is:

  • If we specify anything in runtime_env in remote decorator, the parsing and serialization happens for each function invocation
    • Parsing calls parse_runtime_env, which involves a dictionary to class transformation
    • Serialization calls get_runtime_env_info, which serialize a class into json format

Discussed with @rynewang , the proposed solution here is cache pre-calculated serialized runtime env info, so the parsing and serialization only happens once at initialization.

Benchmarked with the test @jjyao mentioned on the ticket, I confirm we could reach similar performance between no env var vs with env var.

Alternatives considered:

  • Use functools.cache for get_runtime_env_info, which is a stateless function
    • The caveat is, we have to figure out an acceptable way to decide whether serialization options and runtime env info is the same, simply traversing all fields is not a good plan

@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Nov 14, 2024
# runtime env will be merged and re-serialized.
#
# Caveat: for `func.option().remote()`, we have to recalculate serialized
# runtime env info upon every call. But it's acceptable since pre-calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more clear,

To support dynamic runtime envs in `func.option(runtime_env={...}).remote()`, we recalculate the serialized runtime env info in the `option` call. If there are multiple calls to a same option, one can save the calculation by `opt_f = func.option(runtime_env={...}); [opt_f.remote() for i in range(many)]`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the "If there are multiple calls to a same option" part, since we don't do any caching for option calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted other comments.

python/ray/remote_function.py Outdated Show resolved Hide resolved
python/ray/remote_function.py Outdated Show resolved Hide resolved
# only update runtime_env when ".options()" specifies new runtime_env
# Only update runtime_env and re-calculate serialized runtime env info when
# ".options()" specifies new runtime_env.
serialized_runtime_env_info = self._serialized_base_runtime_env_info
if "runtime_env" in task_options:
updated_options["runtime_env"] = parse_runtime_env(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should no longer populate updated_options["runtime_env"] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically have two options:

  • Keep updated_options["runtime_env"] updated, as we did in the past
  • Remove runtime_env from updated_options

I choose the first method, becase updated_options corresponds to task_options and default_options, so people have expectation there's runtime_env in these options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, ok since we need to support bind as well, where this optimization is not used

Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants