Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于cosyvoice2.0流式生成,有爆音存在 #895

Open
zsytm opened this issue Jan 17, 2025 · 2 comments
Open

关于cosyvoice2.0流式生成,有爆音存在 #895

zsytm opened this issue Jan 17, 2025 · 2 comments

Comments

@zsytm
Copy link

zsytm commented Jan 17, 2025

async def inference_cross_lingual(item: dict = Body(...)):
item_encoded = jsonable_encoder(item)
tts_text = item_encoded.get('query')
prompt_speech_16k = load_wav('/opt/CosyVoice/asset/zero_shot_prompt.wav', 16000)
def compress_audio(audio_data, threshold=0.5, ratio=4.0):
# 计算音频振幅
amplitude = torch.abs(audio_data)
# 对超过阈值的部分进行压缩
mask = amplitude > threshold
compressed = torch.where(mask,
threshold + (amplitude - threshold) / ratio,
amplitude)
# 保持原始信号的正负
return compressed * torch.sign(audio_data)
async def audio_stream():
for i, j in enumerate(cosyvoice.inference_cross_lingual(tts_text, prompt_speech_16k, stream=True)):
audio_data = j['tts_speech']
audio_data = compress_audio(audio_data)
buffer = io.BytesIO()
torchaudio.save(buffer, audio_data, cosyvoice.sample_rate, format='wav')
buffer.seek(0)
yield buffer.read()

return StreamingResponse(audio_stream(), media_type="audio/wav")

流式输出,存在爆音

@cpken
Copy link

cpken commented Jan 17, 2025

cosyvoice.sample_rate 换成 16000 再试一下,看一下是不是采样率不一致导致的。

- torchaudio.save(buffer, audio_data, cosyvoice.sample_rate, format='wav')
+ torchaudio.save(buffer, audio_data, 16000, format='wav')

@aluminumbox
Copy link
Collaborator

按readme里的example一样提供合成文本和示例代码

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants