-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model server configurations to InferencePool #163
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: liu-cong The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@liu-cong: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
I think we should apply caution to adjusting the API. I think both your bullet points are handled by a well defined model server protocol. |
Additionally, a user could, in theory, mix and match their model servers and we should not be concerned with that, since they would all implement the same protocol. |
Valid point. The challenge is that it will take time to get the protocol implemented. In the meantime we don't want to block the development of the extension. What do you think about this tradeoff solution of adding these as flags to the ext-proc binary? This way we don't need to change the API, while unblocking short-term devlopment? |
Yeah I think that's a great tradeoff. @ahg-g WDYT? |
/close Given the discussion above, this can be handled via ext proc flag in the short term. And as much as possible will be handled by the model server protocol in the long term #164 |
+1; we certainly don't want the API to explicitly list the different model servers, we need to define a protocol, and if the protocol requires setting some parameters, then those could be part of the api; but those should not be specific to the model server. For example, we could hypothetically allow configuring the metrics names in the api (the metric name that represent the kv-cache utilization). But even those, could be configured in a generic key/value map in the api because different extensions may rely on completely different set of metrics, and the model server protocol could define the keys that define known metrics for example. |
Add
ModelServerAttributes
field to capture the following information: