Batch inference for the "text-to-semantic" llm #889

hbwu-ntu · 2025-01-16T00:44:25Z

Is your feature request related to a problem? Please describe.
I'm always frustrated when I would like to synthesis large-scale text-semantic sequence pairs, as the current function:

entangled the text-to-semantic function with token-to-wav function
only support inference for batch size is 1

Describe the solution you'd like
Disentangle this function (https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/cli/model.py#L100) and support batch inference:

the input is text sequence
the output is the corresponding semantic token sequence

Describe alternatives you've considered
Disentangle this function (https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/cli/model.py#L100) and support batch inference:

the input is text sequence
the output is the corresponding semantic token sequence

Additional context
N/A

aluminumbox · 2025-01-16T01:33:59Z

this is currently not in our plan, you can use multi process or multi thread to speed up inference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch inference for the "text-to-semantic" llm #889

Batch inference for the "text-to-semantic" llm #889

hbwu-ntu commented Jan 16, 2025

aluminumbox commented Jan 16, 2025

Batch inference for the "text-to-semantic" llm #889

Batch inference for the "text-to-semantic" llm #889

Comments

hbwu-ntu commented Jan 16, 2025

aluminumbox commented Jan 16, 2025