Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(relay): expose internal relay metrics #4497

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

Litarnus
Copy link
Contributor

@Litarnus Litarnus commented Feb 12, 2025

This PR adds an endpoint that can be used to query internal relay metrics.
At the current state it exposes internally collected memory metrics and a simple up so that we can count the number of instances running.

@Litarnus Litarnus marked this pull request as ready for review February 12, 2025 15:47
@Litarnus Litarnus requested a review from a team as a code owner February 12, 2025 15:47
Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as suggested we work on this in multiple steps:

  • Expose the endpoint with basic/mock internal machinery
  • Expose the easier metrics first (e.g. prevent scale down)
  • Tackle the harder metrics like utilization of services generically
  • Then possibly after instrument the special services we have like the processor and store

.route("/api/relay/events/{event_id}/", get(events::handle));
let internal_routes = internal_routes
.route("/api/relay/events/{event_id}/", get(events::handle))
.route("/api/relay/internal-metrics", get(internal_metrics::handle))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what we wanna call the route, maybe we want to make it auto scaling specific?

Metrics is already so overloaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, maybe just /keda?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick to the more generic autoscaling or similar, since this isn't directly consumed by Keda, but the purpose is auto scaling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I rename it

relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
@Litarnus
Copy link
Contributor Author

@Dav1dde I went with the current approach because it seems easier than expected. I will take a step back and change it to the easier stuff first as you suggested

@Litarnus Litarnus marked this pull request as draft February 12, 2025 16:41
@Litarnus Litarnus self-assigned this Feb 13, 2025
@Litarnus Litarnus marked this pull request as ready for review February 13, 2025 08:17
}
};

match serde_prometheus::to_string(&data, None, HashMap::new()) {
Copy link
Member

@Dav1dde Dav1dde Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a bunch of dependencies, maybe we can just serialize it ourselves, the format is quite simple and all of our data is pretty much static.

We can also do that in a follow-up and then parse the output in a test with a proper prometheus parser.

relay = relay(mini_sentry)
response = relay.get("/api/relay/keda/")
assert response.status_code == 200
assert "up 1" in response.text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated note: Wish we had snapshot tests in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants