bad results with OCR inferece #76

katie312 · 2025-02-12T06:01:14Z

I input a pic like this:

it seems like a very easy task, but there is so much problem in the output (structure is fine, but some words are wrong). Is there something wrong with the inference code? or does it only support English.

the code:

import torch
from transformers import AutoModelForCausalLM

from deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
from deepseek_vl2.utils.io import load_pil_images


# specify the path to the model
model_path = "/deepseek-vl2-model"
vl_chat_processor: DeepseekVLV2Processor = DeepseekVLV2Processor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

## single image conversation example
## Please note that <|ref|> and <|/ref|> are designed specifically for the object localization feature. These special tokens are not required for normal conversations.
## If you would like to experience the grounded captioning functionality (responses that include both object localization and reasoning), you need to add the special token <|grounding|> at the beginning of the prompt. Examples could be found in Figure 9 of our paper.
conversation = [
    {
        "role": "<|User|>",
        "content": "<image>\n把图片里面的所有内容进行ocr识别，markdown格式输出",
        "images": ["./396_2352fdc7zf.png"],
    },
    {"role": "<|Assistant|>", "content": ""},
]

# load images and prepare for inputs
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
    conversations=conversation,
    images=pil_images,
    force_batchify=True,
    system_prompt=""
).to(vl_gpt.device)

# run image encoder to get the image embeddings
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)

# run the model to get the response
outputs = vl_gpt.language.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True
)

answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=False)
print(f"{prepare_inputs['sft_format'][0]}", answer)

The text was updated successfully, but these errors were encountered:

ibbol · 2025-02-12T07:00:42Z

我用没问题啊

ibbol · 2025-02-12T07:03:44Z

katie312 · 2025-02-12T07:33:55Z

请问这个在线的是什么模型呀，我这边本地下载用的small版本。
其实原图不是这张，因为是公司的数据不方便放出来，但是大致布局差不多，就几个图标+段文字，输出效果特别差🥲

ibbol · 2025-02-12T07:38:24Z

请问这个在线的是什么模型呀，我这边本地下载用的small版本。其实原图不是这张，因为是公司的数据不方便放出来，但是大致布局差不多，就几个图标+段文字，输出效果特别差🥲

tiny版本，我也想部署small版，但是我是4张24G的4090，不知道怎么分配显存，按照别人的方法拆分layer好像还是显存不够。

katie312 · 2025-02-12T07:50:18Z

请问这个在线的是什么模型呀，我这边本地下载用的small版本。其实原图不是这张，因为是公司的数据不方便放出来，但是大致布局差不多，就几个图标+段文字，输出效果特别差🥲

tiny版本，我也想部署small版，但是我是4张24G的4090，不知道怎么分配显存，按照别人的方法拆分layer好像还是显存不够。

我用的L40作推理，能跑起来，但是为什么tiny比small效果还好，严重怀疑我代码有问题了🥲，或者麻烦测试一下复杂一点的文档看看效果呢？

ibbol · 2025-02-12T07:57:46Z

ibbol · 2025-02-12T08:04:24Z

Giserlei123 · 2025-02-12T09:59:02Z

我直接使用可视化界面，识别效果很好，但是我自己用你这段代码效果也是很差

ibbol · 2025-02-12T10:04:55Z

我用楼主的代码好像没问题啊？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bad results with OCR inferece #76

bad results with OCR inferece #76

katie312 commented Feb 12, 2025 •

edited

Loading

ibbol commented Feb 12, 2025

ibbol commented Feb 12, 2025

katie312 commented Feb 12, 2025

ibbol commented Feb 12, 2025

katie312 commented Feb 12, 2025

ibbol commented Feb 12, 2025

ibbol commented Feb 12, 2025

Giserlei123 commented Feb 12, 2025

ibbol commented Feb 12, 2025

bad results with OCR inferece #76

bad results with OCR inferece #76

Comments

katie312 commented Feb 12, 2025 • edited Loading

ibbol commented Feb 12, 2025

ibbol commented Feb 12, 2025

katie312 commented Feb 12, 2025

ibbol commented Feb 12, 2025

katie312 commented Feb 12, 2025

ibbol commented Feb 12, 2025

ibbol commented Feb 12, 2025

Giserlei123 commented Feb 12, 2025

ibbol commented Feb 12, 2025

katie312 commented Feb 12, 2025 •

edited

Loading