【VLLM加速推理】看到cosyvoice2.0的llm模块为Qwen的，想咨询下会做vllm加速适配吗 #873

wang-TJ-20 · 2025-01-11T03:27:49Z

看到llm的计算是这块，想问下这个会做vllm加速适配吗，这个有参考的吗

aluminumbox · 2025-01-11T15:52:41Z

vllm暂时不支持embedding输入，会考虑做vllm加速但还在研究中

darkacorn · 2025-01-13T13:03:21Z

i dont think that will make much of a difference .. we talking about a 0.5B param model - and not like that needs to be batched ..

it would be a nice to have optimisation but i dont think that is a big impact play to be honest

aluminumbox · 2025-01-13T13:38:08Z

i dont think that will make much of a difference .. we talking about a 0.5B param model - and not like that needs to be batched ..

it would be a nice to have optimisation but i dont think that is a big impact play to be honest

i am not expert in vllm, but i think vllm also have some inference optimization like page attention. anyway we will and will not provide similar inference optimization with our aliyun service

darkacorn · 2025-01-13T14:48:39Z

vllm has embeddings .. but that is only relevant for the llm aspect .. there is way more - i think the biggest bottle neck here is the flowmatcher

aluminumbox · 2025-01-13T14:53:20Z

vllm has embeddings .. but that is only relevant for the llm aspect .. there is way more - i think the biggest bottle neck here is the flowmatcher

we already provided tensorrt for flow matching, this is our aliyun service inference method

wang-TJ-20 · 2025-01-14T05:15:04Z

i dont think that will make much of a difference .. we talking about a 0.5B param model - and not like that needs to be batched ..
it would be a nice to have optimisation but i dont think that is a big impact play to be honest

i am not expert in vllm, but i think vllm also have some inference optimization like page attention. anyway we will and will not provide similar inference optimization with our aliyun service

所以llm模块还有其他的一些优化思路吗

WuNein · 2025-01-16T06:08:46Z

sglang可以输入input embeddings

sgl-project/sglang#2052

wang-TJ-20 · 2025-01-16T08:50:15Z

@WuNein hi ，你有试过cosyvoice 在sglang的加速适配吗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【VLLM加速推理】看到cosyvoice2.0的llm模块为Qwen的，想咨询下会做vllm加速适配吗 #873

【VLLM加速推理】看到cosyvoice2.0的llm模块为Qwen的，想咨询下会做vllm加速适配吗 #873

wang-TJ-20 commented Jan 11, 2025

aluminumbox commented Jan 11, 2025

darkacorn commented Jan 13, 2025

aluminumbox commented Jan 13, 2025

darkacorn commented Jan 13, 2025

aluminumbox commented Jan 13, 2025

wang-TJ-20 commented Jan 14, 2025

WuNein commented Jan 16, 2025

wang-TJ-20 commented Jan 16, 2025

【VLLM加速推理】看到cosyvoice2.0的llm模块为Qwen的，想咨询下会做vllm加速适配吗 #873

【VLLM加速推理】看到cosyvoice2.0的llm模块为Qwen的，想咨询下会做vllm加速适配吗 #873

Comments

wang-TJ-20 commented Jan 11, 2025

aluminumbox commented Jan 11, 2025

darkacorn commented Jan 13, 2025

aluminumbox commented Jan 13, 2025

darkacorn commented Jan 13, 2025

aluminumbox commented Jan 13, 2025

wang-TJ-20 commented Jan 14, 2025

WuNein commented Jan 16, 2025

wang-TJ-20 commented Jan 16, 2025