You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I read this and understand the corpora used for characters_to_jyutping are.
(i) the HKCanCor corpus data included in the PyCantonese library, and (ii) the rime-cantonese data https://pycantonese.org/jyutping.html
The issue I found is, it seems at least one word, if converted to jyutping, give an incorrect jyutping result?
Expected behavior
according to here. https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/
到 should be dou3, so expected results are:
pycantonese.characters_to_jyutping('到')
[('到', 'dou3')]
pycantonese.characters_to_jyutping('感到')
[('感到', 'gam2dou3')]
pycantonese.characters_to_jyutping('到底')
[('到底', 'dou3dai2')]
I wonder if there is any way to resolve this problem, so pycantonese.characters_to_jyutping will return dou3 for 到 and 感到?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, sorry for not replying earlier. Between rime-cantonese and HKCanCor, the current code prefers the rime-cantonese data in case the two data sources don't agree. I'll have to dig into what the included rime-cantonese data looks like. Maybe the upstream rime-cantonese data has been updated and I could just use the updated data, or I could override these known cases. Thank you for reporting this!
For my purposes, I'd need an automatic way to tell which char (or word, if this happens in word.csv) to pick for its jyutping. Is it safe to always choose the last one? Or is there another lookup or something?
Describe the bug
I read this and understand the corpora used for characters_to_jyutping are.
(i) the HKCanCor corpus data included in the PyCantonese library, and (ii) the rime-cantonese data
https://pycantonese.org/jyutping.html
The issue I found is, it seems at least one word, if converted to jyutping, give an incorrect jyutping result?
To reproduce
pycantonese.characters_to_jyutping('到')
[('到', 'dou2')]
pycantonese.characters_to_jyutping('感到')
[('感到', 'gam2dou2')]
pycantonese.characters_to_jyutping('到底')
[('到底', 'dou3dai2')]
Expected behavior
according to here. https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/
到 should be dou3, so expected results are:
pycantonese.characters_to_jyutping('到')
[('到', 'dou3')]
pycantonese.characters_to_jyutping('感到')
[('感到', 'gam2dou3')]
pycantonese.characters_to_jyutping('到底')
[('到底', 'dou3dai2')]
I wonder if there is any way to resolve this problem, so pycantonese.characters_to_jyutping will return dou3 for 到 and 感到?
Thanks!
The text was updated successfully, but these errors were encountered: