-
Notifications
You must be signed in to change notification settings - Fork 941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【VLLM加速推理】看到cosyvoice2.0的llm模块为Qwen的,想咨询下会做vllm加速适配吗 #873
Comments
vllm暂时不支持embedding输入,会考虑做vllm加速但还在研究中 |
i dont think that will make much of a difference .. we talking about a 0.5B param model - and not like that needs to be batched .. it would be a nice to have optimisation but i dont think that is a big impact play to be honest |
i am not expert in vllm, but i think vllm also have some inference optimization like page attention. anyway we will and will not provide similar inference optimization with our aliyun service |
vllm has embeddings .. but that is only relevant for the llm aspect .. there is way more - i think the biggest bottle neck here is the flowmatcher |
we already provided tensorrt for flow matching, this is our aliyun service inference method |
所以llm模块还有其他的一些优化思路吗 |
sglang可以输入input embeddings |
@WuNein hi ,你有试过cosyvoice 在sglang的加速适配吗 |
看到llm的计算是这块,想问下这个会做vllm加速适配吗,这个有参考的吗
The text was updated successfully, but these errors were encountered: