Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-persistent example doesn't work on Mixtral-8*7B-v0.1 #513

Open
tang-t21 opened this issue Jul 26, 2024 · 0 comments
Open

non-persistent example doesn't work on Mixtral-8*7B-v0.1 #513

tang-t21 opened this issue Jul 26, 2024 · 0 comments

Comments

@tang-t21
Copy link

import mii
pipe = mii.pipeline("/data/mixtral/Mixtral-8x7B-v0.1")
response = pipe(["DeepSpeed is"], max_new_tokens=128)
print(response)

Run this by 'deepspeed --num_gpus=4' will report following error for each rank:
[rank0]: response = pipe(["DeepSpeed is"], max_new_tokens=128)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 597, in call
[rank0]: self.generate()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/utils.py", line 31, in wrapper
[rank0]: return func(self, *args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 117, in generate
[rank0]: next_token_logits = self.put(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 500, in put
[rank0]: return self.inference_engine.put(uids, tokenized_input, do_checks=False)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/engine_v2.py", line 146, in put
[rank0]: logits = self._model.forward(self._batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 259, in forward
[rank0]: residual, hidden_states = self._forward_transformer(layer_idx, residual, hidden_states, wrapped_batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 214, in _forward_transformer
[rank0]: hidden_states = self.moe(hidden_states, ragged_batch_info, cur_params.moe_gate, cur_params.moe_mlp_1,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/implementations/moe/cutlass_multi_gemm.py", line 223, in forward
[rank0]: self._mlp_1(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/kernels/cutlass_ops/moe_gemm/moe_gemm.py", line 59, in call
[rank0]: self.kernel(ordered_output, ordered_input, weights, biases, total_rows_before_expert, self.act_fn)
[rank0]: RuntimeError: [FT Error][MoE][GEMM Dispatch] Arch unsupported for MoE GEMM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant