Skip to content

In my evaluations, Gemini 1.5 flash 002 provided the best OCR performance, outperforming GPT-4o and Claude Connect 3.5. I chose to use AI instead of direct OCR software because some Chinese books have vertical text formatting, which open-source OCR solutions failed to handle. After a day of development

License

Notifications You must be signed in to change notification settings

jinpengchina/pdf_to_txt_gemini_ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

The Gemini 1.5 flash 002 has delivered the best OCR performance in my evaluations, surpassing models like GPT-4o and Claude Connect 3.5. The reason I chose to use AI instead of a direct OCR software is that some Chinese books are formatted with vertical text, and open-source OCR solutions failed to handle this. After a day of development, I hope this will be helpful to others. Running in Google Colab, it's more convenient to set the API key according to the image placement.

gemini 1.5 flash 002 是我评测效果最好的ocr效果, 优于 4o, claude sonnect 3.5....... 我之所以要使用ai而不是直接ocr软件,是因为中文中有一些书是竖版文字排版, 试验了开源ocr无法工作, 一天时间开发,希望对其他人有帮助, 在google colab运行, 按照图片提升位置设置api key比较方便.

Screenshot 2024-10-07 at 7 48 51 PM

About

In my evaluations, Gemini 1.5 flash 002 provided the best OCR performance, outperforming GPT-4o and Claude Connect 3.5. I chose to use AI instead of direct OCR software because some Chinese books have vertical text formatting, which open-source OCR solutions failed to handle. After a day of development

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published