关于cosyvoice2.0流式生成，有爆音存在 #895

zsytm · 2025-01-17T02:03:45Z

async def inference_cross_lingual(item: dict = Body(...)):
item_encoded = jsonable_encoder(item)
tts_text = item_encoded.get('query')
prompt_speech_16k = load_wav('/opt/CosyVoice/asset/zero_shot_prompt.wav', 16000)
def compress_audio(audio_data, threshold=0.5, ratio=4.0):
# 计算音频振幅
amplitude = torch.abs(audio_data)
# 对超过阈值的部分进行压缩
mask = amplitude > threshold
compressed = torch.where(mask,
threshold + (amplitude - threshold) / ratio,
amplitude)
# 保持原始信号的正负
return compressed * torch.sign(audio_data)
async def audio_stream():
for i, j in enumerate(cosyvoice.inference_cross_lingual(tts_text, prompt_speech_16k, stream=True)):
audio_data = j['tts_speech']
audio_data = compress_audio(audio_data)
buffer = io.BytesIO()
torchaudio.save(buffer, audio_data, cosyvoice.sample_rate, format='wav')
buffer.seek(0)
yield buffer.read()

return StreamingResponse(audio_stream(), media_type="audio/wav")

流式输出，存在爆音

The text was updated successfully, but these errors were encountered:

cpken · 2025-01-17T02:56:22Z

cosyvoice.sample_rate 换成 16000 再试一下，看一下是不是采样率不一致导致的。

- torchaudio.save(buffer, audio_data, cosyvoice.sample_rate, format='wav')
+ torchaudio.save(buffer, audio_data, 16000, format='wav')

aluminumbox · 2025-01-17T08:06:35Z

按readme里的example一样提供合成文本和示例代码

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于cosyvoice2.0流式生成，有爆音存在 #895

关于cosyvoice2.0流式生成，有爆音存在 #895

zsytm commented Jan 17, 2025

cpken commented Jan 17, 2025 •

edited

Loading

aluminumbox commented Jan 17, 2025

关于cosyvoice2.0流式生成，有爆音存在 #895

关于cosyvoice2.0流式生成，有爆音存在 #895

Comments

zsytm commented Jan 17, 2025

cpken commented Jan 17, 2025 • edited Loading

aluminumbox commented Jan 17, 2025

cpken commented Jan 17, 2025 •

edited

Loading