Release v0.13.2 Patch release · microsoft/DeepSpeed

What's Changed

Update version.txt after 0.13.1 release by @mrwyattii in #5002
Support exclude_frozen_parameters for save_16bit_model by @LZHgrla in #4999
Allow nightly tests dispatch by @mrwyattii in #5014
Enable hpz based on secondary tensor presence by @HeyangQin in #4906
Enable workflow dispatch on all workflows by @loadams in #5016
[minor] improve code quality and readablilty by @ByronHsu in #5011
Update falcon fused type order by @Yejing-Lai in #5007
Fix error report of DSElasticAgent._set_master_addr_port() by @RobinDong in #4985
DS #4993 #662 : autotune single node hostfile bugfix by @oushu1zhangxiangxuan1 in #4996
[minor] Improve logging for multiprocesses by @ByronHsu in #5004
deepspeed/launcher: add launcher_helper as each rank's start portal by @YizhouZ in #4699
Graph capture support on HPU accelerators by @deepcharm in #5013
launcher/launcher_helper.py: fix PMI name and add EnvironmentError by @YizhouZ in #5025
Remove MI100 badge from landing page by @mrwyattii in #5036
Remove coverage reports from workflows and fix for inference CI by @loadams in #5028
Remove Megatron-DeepSpeed CI workflow by @mrwyattii in #5038
Fix P40 CI failures by @mrwyattii in #5037
Fix for nightly torch CI by @mrwyattii in #5039
Fix nv-accelerate and nv-torch-latest-v100. by @loadams in #5035
update inference pages to point to FastGen by @mrwyattii in #5029
launcher_helper: enable fds passing by @YizhouZ in #5042
Fix nv-torch-latest-cpu CI by @mrwyattii in #5045
[NPU] Add NPU to support hybrid engine by @CurryRice233 in #4831
MoE type hints by @ringohoffman in #5043
[doc] update inference related docs from mp_size to tensor_parallel for TP by @yundai424 in #5048
Fix broken model names in inference CI by @mrwyattii in #5053
[NPU] Change log level to debug by @CurryRice233 in #5051
Delay reduce-scatter for ZeRO3 leaf modules by @tohtana in #5008
Optimize grad_norm calculations by reducing device/host dependency by @nelyahu in #4974
load linear layer weight with given dtype by @polisettyvarma in #4044
Update import for changes to latest diffusers by @mrwyattii in #5065
adding hccl to init_distributed function description by @nelyahu in #5034
[Zero++ qgZ] Fall back to reduce_scatter if tensor.numel() % (2 * global_world_size) != 0 by @ByronHsu in #5056
Make batch size documentation clearer by @segyges in #5072
[doc/1-line change] default stage3_param_persistence_threshold is wrong in the doc by @ByronHsu in #5073
Further refactor deepspeed.moe.utils + deepspeed.moe.layer type hints by @ringohoffman in #5060
Fix verification for ZeRO3 leaf module by @tohtana in #5074
Stop tracking backward chain of broadcast in initialization by @tohtana in #5075
Update torch version for nv-torch-latest-cpu by @loadams in #5086
Add backwards compatibility w/ older versions of diffusers (<0.25.0) by @lekurile in #5083
Enable torch.compile with ZeRO (Experimental) by @tohtana in #4878
Update nv-accelerate to latest torch by @loadams in #5040
HPU Accelerator: fix supported_dtypes API by @nelyahu in #5094
[NPU] replace 'cuda' with get_accelerator().device_name() by @minchao-sun in #5095
optimize clip_grad_norm_ function by @mmhab in #4915
[xs] fix ZEROPP convergence test by @yundai424 in #5061
Switch hasattr check from compile to compiler by @loadams in #5096
Split is_synchronized_device api to multiple apis by @BacharL in #5026
47% FastGen speedup for low workload - refactor allocator by @HeyangQin in #5090
Support exclude_frozen_parameters for zero_to_fp32.py script by @andstor in #4979
Fix alignment of optimizer states when loading by @tohtana in #5105
Skip Triton import for AMD by @lekurile in #5110
Add HIP conversion file outputs to .gitignore by @lekurile in #5111
Remove optimizer step on initialization by @tohtana in #5104

New Contributors

@ByronHsu made their first contribution in #5011
@RobinDong made their first contribution in #4985
@oushu1zhangxiangxuan1 made their first contribution in #4996
@yundai424 made their first contribution in #5048
@segyges made their first contribution in #5072
@andstor made their first contribution in #4979

Full Changelog: v0.13.1...v0.13.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.13.2 Patch release

What's Changed

New Contributors

Contributors