Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

记录一下解决minicpm-o-2.6的running的bug #732

Open
thughy opened this issue Jan 16, 2025 · 5 comments
Open

记录一下解决minicpm-o-2.6的running的bug #732

thughy opened this issue Jan 16, 2025 · 5 comments
Assignees

Comments

@thughy
Copy link

thughy commented Jan 16, 2025

例如,以为了跑通以下readme里面的 Multimodal Live Streaming 章节给出的code,如果我们直接使用如下这段code肯定是跑不通的:

`import math
import numpy as np
from PIL import Image
from moviepy.editor import VideoFileClip
import tempfile
import librosa
import soundfile as sf

def get_video_chunk_content(video_path, flatten=True):
video = VideoFileClip(video_path)
print('video_duration:', video.duration)

with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as temp_audio_file:
    temp_audio_file_path = temp_audio_file.name
    video.audio.write_audiofile(temp_audio_file_path, codec="pcm_s16le", fps=16000)
    audio_np, sr = librosa.load(temp_audio_file_path, sr=16000, mono=True)
num_units = math.ceil(video.duration)

contents= []
for i in range(num_units):
    frame = video.get_frame(i+1)
    image = Image.fromarray((frame).astype(np.uint8))
    audio = audio_np[sr*i:sr*(i+1)]
    if flatten:
        contents.extend(["<unit>", image, audio])
    else:
        contents.append(["<unit>", image, audio])

return contents

video_path="/path/to/video"
sys_msg = model.get_sys_prompt(mode='omni', language='en')

contents = get_video_chunk_content(video_path)
msg = {"role":"user", "content": contents}
msgs = [sys_msg, msg]

generate_audio = True
output_audio_path = 'output.wav'

res = model.chat(
msgs=msgs,
tokenizer=tokenizer,
sampling=True,
temperature=0.5,
max_new_tokens=4096,
omni_input=True, # please set omni_input=True when omni inference
use_tts_template=True,
generate_audio=generate_audio,
output_audio_path=output_audio_path,
max_slice_nums=1,
use_image_id=False,
return_dict=True
)
print(res)`

@thughy
Copy link
Author

thughy commented Jan 16, 2025

从实践中,需要添加如下 code 才能跑通:

torch.manual_seed(100)

model = AutoModel.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True,
attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
model.init_tts()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)
print('finish loading model')

@YuzaChongyi
Copy link
Collaborator

感谢分享,我们更新一下 github 的示例代码,huggingface 的代码会完整一些

@cheng358
Copy link

感谢分享,我们更新一下 github 的示例代码,huggingface 的代码会完整一些

vllm运行minicpm-o-2.6的demo有吗,我按照readme的方式部署了下,调用的时候各种报错,按照文档的格式调不通

@cheng358
Copy link

cheng358 commented Jan 17, 2025

感谢分享,我们更新一下 github 的示例代码,huggingface 的代码会完整一些
比如如下请求
curl --location --request POST 'http://101.230.144.224:12341/v1/completions'
--header 'Content-Type: application/json'
--data-raw '{
"model": "MiniCPM",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "图中人物是什么性别"
},
{
"type": "image_url",
"image_url": {
"url": ""
}
}
]
}
],
"stream": false
}'

在2.6上是正常的,但是在哦-.6提示"object": "error",
"message": "[{'type': 'missing', 'loc': ('body', 'prompt'), 'msg': 'Field required', 'input': {'model': 'MiniCPM', 'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': '图中人物是什么性别'}, {'type': 'image_url', 'image_url': {'url': '

限于篇幅图像的base64部分我只贴了一部分

@YuzaChongyi
Copy link
Collaborator

我们更新了 vllm 的 minicpmo 分支,请拉取最新代码再次尝试,以及可以参考 #742

@YuzaChongyi YuzaChongyi self-assigned this Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants