An attempt to learn and read a graphical character set (in this instance, subtitles on UK digital terrestrial broadcasts) using Python.
I've not touched this in ages, it might still work! Included is a sample database built from the Tiresias font used by the BBC and others along with some sample titles to learn from and test with.
YMMV