Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Emit metrics of cluster creation and other related metrics for observability #2681

Open
2 tasks done
ujjawal-khare-27 opened this issue Dec 23, 2024 · 3 comments
Open
2 tasks done
Assignees
Labels

Comments

@ujjawal-khare-27
Copy link

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

In production, ensuring that Ray cluster creation adheres to strict SLAs is crucial. Any unexpected delays in cluster creation can stem from various factors, such as prolonged image pull times, node creation delays, or suboptimal KubeRay settings. Currently, this process requires manual monitoring, which is cumbersome and prone to missing critical events or edge cases. Automating this monitoring is essential to improve reliability and efficiency.

Use case

Deeper insights and observability

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421
Copy link
Member

kevin85421 commented Jan 15, 2025

@ujjawal-khare-27 this makes sense to me. Would you mind providing a list of metrics that you think it's helpful for you? We will define the scope for v1.4.0. Maybe metrics about starting up time which are the missing pieces in Ray metrics.

@kevin85421
Copy link
Member

I read the blog from @cchen777. The following picture is a good reference.

Image

@win5923
Copy link
Contributor

win5923 commented Jan 16, 2025

@kevin85421 can i take this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants