You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the implementation is hardcoded for vllm support, we need to have a way to make it configurable to support other model servers; for example pass in the names of the metrics that the algorithm depends on rather than hardcoding them.
The text was updated successfully, but these errors were encountered:
Currently it's not quite feasible to have a generic solution for any model server, until we finalize the model server protocol and implement it, which will take time. Plus, implementing more model servers help us discover new requirements that should go to the protocol.
Luckily, our interaction surface with model servers is really small (mostly scraping metrics). So I propose the short term solution.
Add a modelServer flag to the ext-proc binary to tell which model server to use.
Add a enableLoRA flag to the ext-proc binary. If LoRA is enabled, then we will scrape LoRA metrics.
Add an internal map dada structure to define the model server to metric name mapping, and extract out the helper functions in existing vllm implementation.
If there is any bespoke logic we need for a new model server, consider adding a new implementation in the backend package. But we need to be mindful and keep this minimal.
Currently the implementation is hardcoded for vllm support, we need to have a way to make it configurable to support other model servers; for example pass in the names of the metrics that the algorithm depends on rather than hardcoding them.
The text was updated successfully, but these errors were encountered: