Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large-v2 does not support the "yue" but no bug in faster-whisper #1228

Open
Coconut3223 opened this issue Jan 17, 2025 · 0 comments
Open

large-v2 does not support the "yue" but no bug in faster-whisper #1228

Coconut3223 opened this issue Jan 17, 2025 · 0 comments

Comments

@Coconut3223
Copy link

First large-v2 does not support the "yue" language token and large-v3 supports.

However when I use faster-whisper to load large-v2 and transcribe sentence with the param language="yue", there is no bug. It works but it does be wrong.


from faster_whisper import WhisperModel
model = WhisperModel("large-v2",
                    device=DEVICE,
                    )
>>> print(model.hf_tokenizer.token_to_id("<|%s|>" % "yue"))
None
>>> print(model.hf_tokenizer.token_to_id("<|%s|>" % "zh"))
xxx
input_language = 'yue'
transcribe_params = {
    "language": input_language,
    "word_timestamps": True,
    "vad_filter": True,
    "initial_prompt": initial_prompt,
    "vad_parameters": dict(min_silence_duration_ms=1000,),
}
whisper_segments, info = model.transcribe(audio, **transcribe_params)
for whis_seg in whisper_segments:
    print(whis_seg.text.strip())
print(info)

""" Result
2023-2024年度修訂預算,
受環球利率上升的
TranscriptionInfo(language='yue', language_probability=1, ....)
"""
input_language = 'zh'
transcribe_params = {
    "language": input_language,
    "word_timestamps": True,
    "vad_filter": True,
    "initial_prompt": initial_prompt,
    "vad_parameters": dict(min_silence_duration_ms=1000,),
}
whisper_segments, info = model.transcribe(audio, **transcribe_params)
for whis_seg in whisper_segments:
    print(whis_seg.text.strip())
print(info)

""" Result
二零二三二四年度修訂預算
受環球利率上升
TranscriptionInfo(language='zh', language_probability=1, ....)
"""

from faster_whisper import WhisperModel
model3 = WhisperModel("large-v3",
                    device=DEVICE,
                    )
>>> print(model3.hf_tokenizer.token_to_id("<|%s|>" % "yue"))
50358
>>> print(model3.hf_tokenizer.token_to_id("<|%s|>" % "zh"))
50260


openai/whisper

import whisper
model = whisper.load_model("large-v2",)

input_language = 'yue'
result  = model.transcribe(audio, language=input_language)

"""
--> [154]  sot_sequence.append(sot + 1 + langs.index(self.language))

ValueError: tuple.index(x): x not in tuple

"""

import whisper
model = whisper.load_model("large-v3",)
input_language = 'yue'
result  = model.transcribe(audio, language=input_language)

"""
{'text': ' 二零二三二四年度修訂預算受環球利率上升',
 'segments': [{'id': 0,
   'seek': 0,
...}
"""

Question:

language-token is put the start of encoded_input in openai/whisper. But it seems that language-token is not parsed to model as expected

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant