Support for model servers other than vllm #95

ahg-g · 2024-12-12T00:35:41Z

Currently the implementation is hardcoded for vllm support, we need to have a way to make it configurable to support other model servers; for example pass in the names of the metrics that the algorithm depends on rather than hardcoding them.

liu-cong · 2025-01-08T18:43:35Z

Currently it's not quite feasible to have a generic solution for any model server, until we finalize the model server protocol and implement it, which will take time. Plus, implementing more model servers help us discover new requirements that should go to the protocol.

Luckily, our interaction surface with model servers is really small (mostly scraping metrics). So I propose the short term solution.

Add a modelServer flag to the ext-proc binary to tell which model server to use.
Add a enableLoRA flag to the ext-proc binary. If LoRA is enabled, then we will scrape LoRA metrics.
Add an internal map dada structure to define the model server to metric name mapping, and extract out the helper functions in existing vllm implementation.
If there is any bespoke logic we need for a new model server, consider adding a new implementation in the backend package. But we need to be mindful and keep this minimal.

Tasks

This was referenced Jan 8, 2025

Integrate with Triton TensorRT-LLM #170

Open

Integrate with TGI model server #171

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for model servers other than vllm #95

Support for model servers other than vllm #95

ahg-g commented Dec 12, 2024

liu-cong commented Jan 8, 2025 •

edited

Loading

Support for model servers other than vllm #95

Support for model servers other than vllm #95

Comments

ahg-g commented Dec 12, 2024

liu-cong commented Jan 8, 2025 • edited Loading

Tasks

liu-cong commented Jan 8, 2025 •

edited

Loading