The Gemini 1.5 flash 002 has delivered the best OCR performance in my evaluations, surpassing models like GPT-4o and Claude Connect 3.5. The reason I chose to use AI instead of a direct OCR software is that some Chinese books are formatted with vertical text, and open-source OCR solutions failed to handle this. After a day of development, I hope this will be helpful to others. Running in Google Colab, it's more convenient to set the API key according to the image placement.
gemini 1.5 flash 002 是我评测效果最好的ocr效果, 优于 4o, claude sonnect 3.5....... 我之所以要使用ai而不是直接ocr软件,是因为中文中有一些书是竖版文字排版, 试验了开源ocr无法工作, 一天时间开发,希望对其他人有帮助, 在google colab运行, 按照图片提升位置设置api key比较方便.
![Screenshot 2024-10-07 at 7 48 51 PM](https://private-user-images.githubusercontent.com/7285298/374362554-87e9eb9b-23f1-4ef6-a75a-49107d0b57fb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTY1ODYsIm5iZiI6MTczOTAxNjI4NiwicGF0aCI6Ii83Mjg1Mjk4LzM3NDM2MjU1NC04N2U5ZWI5Yi0yM2YxLTRlZjYtYTc1YS00OTEwN2QwYjU3ZmIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMTIwNDQ2WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NjdjNWJkYjhhZjUxMTg4YTdlNTA1MTEzOThhYzA1OTVlYjIwOWQ3MTc2MWM0Y2UwYzE4NjE5MTRmYTlhMjg4ZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.MkMYRePmcW2dHCCOIAk8Aviu9E_M-6JOZnHGq3hAp5c)