You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run this by 'deepspeed --num_gpus=4' will report following error for each rank:
[rank0]: response = pipe(["DeepSpeed is"], max_new_tokens=128)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 597, in call
[rank0]: self.generate()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/utils.py", line 31, in wrapper
[rank0]: return func(self, *args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 117, in generate
[rank0]: next_token_logits = self.put(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 500, in put
[rank0]: return self.inference_engine.put(uids, tokenized_input, do_checks=False)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/engine_v2.py", line 146, in put
[rank0]: logits = self._model.forward(self._batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 259, in forward
[rank0]: residual, hidden_states = self._forward_transformer(layer_idx, residual, hidden_states, wrapped_batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 214, in _forward_transformer
[rank0]: hidden_states = self.moe(hidden_states, ragged_batch_info, cur_params.moe_gate, cur_params.moe_mlp_1,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/implementations/moe/cutlass_multi_gemm.py", line 223, in forward
[rank0]: self._mlp_1(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/kernels/cutlass_ops/moe_gemm/moe_gemm.py", line 59, in call
[rank0]: self.kernel(ordered_output, ordered_input, weights, biases, total_rows_before_expert, self.act_fn)
[rank0]: RuntimeError: [FT Error][MoE][GEMM Dispatch] Arch unsupported for MoE GEMM
The text was updated successfully, but these errors were encountered:
Run this by 'deepspeed --num_gpus=4' will report following error for each rank:
[rank0]: response = pipe(["DeepSpeed is"], max_new_tokens=128)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 597, in call
[rank0]: self.generate()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/utils.py", line 31, in wrapper
[rank0]: return func(self, *args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 117, in generate
[rank0]: next_token_logits = self.put(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 500, in put
[rank0]: return self.inference_engine.put(uids, tokenized_input, do_checks=False)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/engine_v2.py", line 146, in put
[rank0]: logits = self._model.forward(self._batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 259, in forward
[rank0]: residual, hidden_states = self._forward_transformer(layer_idx, residual, hidden_states, wrapped_batch)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 214, in _forward_transformer
[rank0]: hidden_states = self.moe(hidden_states, ragged_batch_info, cur_params.moe_gate, cur_params.moe_mlp_1,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/implementations/moe/cutlass_multi_gemm.py", line 223, in forward
[rank0]: self._mlp_1(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/kernels/cutlass_ops/moe_gemm/moe_gemm.py", line 59, in call
[rank0]: self.kernel(ordered_output, ordered_input, weights, biases, total_rows_before_expert, self.act_fn)
[rank0]: RuntimeError: [FT Error][MoE][GEMM Dispatch] Arch unsupported for MoE GEMM
The text was updated successfully, but these errors were encountered: