We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async def inference_cross_lingual(item: dict = Body(...)): item_encoded = jsonable_encoder(item) tts_text = item_encoded.get('query') prompt_speech_16k = load_wav('/opt/CosyVoice/asset/zero_shot_prompt.wav', 16000) def compress_audio(audio_data, threshold=0.5, ratio=4.0): # 计算音频振幅 amplitude = torch.abs(audio_data) # 对超过阈值的部分进行压缩 mask = amplitude > threshold compressed = torch.where(mask, threshold + (amplitude - threshold) / ratio, amplitude) # 保持原始信号的正负 return compressed * torch.sign(audio_data) async def audio_stream(): for i, j in enumerate(cosyvoice.inference_cross_lingual(tts_text, prompt_speech_16k, stream=True)): audio_data = j['tts_speech'] audio_data = compress_audio(audio_data) buffer = io.BytesIO() torchaudio.save(buffer, audio_data, cosyvoice.sample_rate, format='wav') buffer.seek(0) yield buffer.read()
return StreamingResponse(audio_stream(), media_type="audio/wav")
流式输出,存在爆音
The text was updated successfully, but these errors were encountered:
cosyvoice.sample_rate 换成 16000 再试一下,看一下是不是采样率不一致导致的。
- torchaudio.save(buffer, audio_data, cosyvoice.sample_rate, format='wav') + torchaudio.save(buffer, audio_data, 16000, format='wav')
Sorry, something went wrong.
按readme里的example一样提供合成文本和示例代码
No branches or pull requests
async def inference_cross_lingual(item: dict = Body(...)):
item_encoded = jsonable_encoder(item)
tts_text = item_encoded.get('query')
prompt_speech_16k = load_wav('/opt/CosyVoice/asset/zero_shot_prompt.wav', 16000)
def compress_audio(audio_data, threshold=0.5, ratio=4.0):
# 计算音频振幅
amplitude = torch.abs(audio_data)
# 对超过阈值的部分进行压缩
mask = amplitude > threshold
compressed = torch.where(mask,
threshold + (amplitude - threshold) / ratio,
amplitude)
# 保持原始信号的正负
return compressed * torch.sign(audio_data)
async def audio_stream():
for i, j in enumerate(cosyvoice.inference_cross_lingual(tts_text, prompt_speech_16k, stream=True)):
audio_data = j['tts_speech']
audio_data = compress_audio(audio_data)
buffer = io.BytesIO()
torchaudio.save(buffer, audio_data, cosyvoice.sample_rate, format='wav')
buffer.seek(0)
yield buffer.read()
流式输出,存在爆音
The text was updated successfully, but these errors were encountered: