non-persistent example doesn't work on Mixtral-8*7B-v0.1 #513

tang-t21 · 2024-07-26T00:40:05Z

import mii
pipe = mii.pipeline("/data/mixtral/Mixtral-8x7B-v0.1")
response = pipe(["DeepSpeed is"], max_new_tokens=128)
print(response)

Run this by 'deepspeed --num_gpus=4' will report following error for each rank:
[rank0]: response = pipe(["DeepSpeed is"], max_new_tokens=128)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 597, in call
[rank0]: self.generate()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/utils.py", line 31, in wrapper
[rank0]: return func(self, *args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 117, in generate
[rank0]: next_token_logits = self.put(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 500, in put
[rank0]: return self.inference_engine.put(uids, tokenized_input, do_checks=False)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/engine_v2.py", line 146, in put
[rank0]: logits = self._model.forward(self._batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 259, in forward
[rank0]: residual, hidden_states = self._forward_transformer(layer_idx, residual, hidden_states, wrapped_batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 214, in _forward_transformer
[rank0]: hidden_states = self.moe(hidden_states, ragged_batch_info, cur_params.moe_gate, cur_params.moe_mlp_1,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/implementations/moe/cutlass_multi_gemm.py", line 223, in forward
[rank0]: self._mlp_1(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/kernels/cutlass_ops/moe_gemm/moe_gemm.py", line 59, in call
[rank0]: self.kernel(ordered_output, ordered_input, weights, biases, total_rows_before_expert, self.act_fn)
[rank0]: RuntimeError: [FT Error][MoE][GEMM Dispatch] Arch unsupported for MoE GEMM

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

non-persistent example doesn't work on Mixtral-8*7B-v0.1 #513

non-persistent example doesn't work on Mixtral-8*7B-v0.1 #513

tang-t21 commented Jul 26, 2024

non-persistent example doesn't work on Mixtral-8*7B-v0.1 #513

non-persistent example doesn't work on Mixtral-8*7B-v0.1 #513

Comments

tang-t21 commented Jul 26, 2024