-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(relay): expose internal relay metrics #4497
base: master
Are you sure you want to change the base?
Conversation
# Conflicts: # Cargo.lock
# Conflicts: # CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as suggested we work on this in multiple steps:
- Expose the endpoint with basic/mock internal machinery
- Expose the easier metrics first (e.g. prevent scale down)
- Tackle the harder metrics like utilization of services generically
- Then possibly after instrument the special services we have like the processor and store
relay-server/src/endpoints/mod.rs
Outdated
.route("/api/relay/events/{event_id}/", get(events::handle)); | ||
let internal_routes = internal_routes | ||
.route("/api/relay/events/{event_id}/", get(events::handle)) | ||
.route("/api/relay/internal-metrics", get(internal_metrics::handle)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what we wanna call the route, maybe we want to make it auto scaling specific?
Metrics is already so overloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, maybe just /keda
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stick to the more generic autoscaling
or similar, since this isn't directly consumed by Keda, but the purpose is auto scaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I rename it
@Dav1dde I went with the current approach because it seems easier than expected. I will take a step back and change it to the easier stuff first as you suggested |
} | ||
}; | ||
|
||
match serde_prometheus::to_string(&data, None, HashMap::new()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a bunch of dependencies, maybe we can just serialize it ourselves, the format is quite simple and all of our data is pretty much static.
We can also do that in a follow-up and then parse the output in a test with a proper prometheus parser.
relay = relay(mini_sentry) | ||
response = relay.get("/api/relay/keda/") | ||
assert response.status_code == 200 | ||
assert "up 1" in response.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated note: Wish we had snapshot tests in Python.
This PR adds an endpoint that can be used to query internal relay metrics.
At the current state it exposes internally collected memory metrics and a simple
up
so that we can count the number of instances running.