v0.13.2 Patch release
What's Changed
- Update version.txt after 0.13.1 release by @mrwyattii in #5002
- Support
exclude_frozen_parameters
forsave_16bit_model
by @LZHgrla in #4999 - Allow nightly tests dispatch by @mrwyattii in #5014
- Enable hpz based on secondary tensor presence by @HeyangQin in #4906
- Enable workflow dispatch on all workflows by @loadams in #5016
- [minor] improve code quality and readablilty by @ByronHsu in #5011
- Update falcon fused type order by @Yejing-Lai in #5007
- Fix error report of DSElasticAgent._set_master_addr_port() by @RobinDong in #4985
- DS #4993 #662 : autotune single node hostfile bugfix by @oushu1zhangxiangxuan1 in #4996
- [minor] Improve logging for multiprocesses by @ByronHsu in #5004
- deepspeed/launcher: add launcher_helper as each rank's start portal by @YizhouZ in #4699
- Graph capture support on HPU accelerators by @deepcharm in #5013
- launcher/launcher_helper.py: fix PMI name and add EnvironmentError by @YizhouZ in #5025
- Remove MI100 badge from landing page by @mrwyattii in #5036
- Remove coverage reports from workflows and fix for inference CI by @loadams in #5028
- Remove Megatron-DeepSpeed CI workflow by @mrwyattii in #5038
- Fix P40 CI failures by @mrwyattii in #5037
- Fix for nightly torch CI by @mrwyattii in #5039
- Fix nv-accelerate and nv-torch-latest-v100. by @loadams in #5035
- update inference pages to point to FastGen by @mrwyattii in #5029
- launcher_helper: enable fds passing by @YizhouZ in #5042
- Fix nv-torch-latest-cpu CI by @mrwyattii in #5045
- [NPU] Add NPU to support hybrid engine by @CurryRice233 in #4831
- MoE type hints by @ringohoffman in #5043
- [doc] update inference related docs from
mp_size
totensor_parallel
for TP by @yundai424 in #5048 - Fix broken model names in inference CI by @mrwyattii in #5053
- [NPU] Change log level to debug by @CurryRice233 in #5051
- Delay reduce-scatter for ZeRO3 leaf modules by @tohtana in #5008
- Optimize grad_norm calculations by reducing device/host dependency by @nelyahu in #4974
- load linear layer weight with given dtype by @polisettyvarma in #4044
- Update import for changes to latest diffusers by @mrwyattii in #5065
- adding hccl to init_distributed function description by @nelyahu in #5034
- [Zero++ qgZ] Fall back to reduce_scatter if
tensor.numel() % (2 * global_world_size) != 0
by @ByronHsu in #5056 - Make batch size documentation clearer by @segyges in #5072
- [doc/1-line change] default stage3_param_persistence_threshold is wrong in the doc by @ByronHsu in #5073
- Further refactor deepspeed.moe.utils + deepspeed.moe.layer type hints by @ringohoffman in #5060
- Fix verification for ZeRO3 leaf module by @tohtana in #5074
- Stop tracking backward chain of broadcast in initialization by @tohtana in #5075
- Update torch version for nv-torch-latest-cpu by @loadams in #5086
- Add backwards compatibility w/ older versions of diffusers (<0.25.0) by @lekurile in #5083
- Enable torch.compile with ZeRO (Experimental) by @tohtana in #4878
- Update nv-accelerate to latest torch by @loadams in #5040
- HPU Accelerator: fix supported_dtypes API by @nelyahu in #5094
- [NPU] replace 'cuda' with get_accelerator().device_name() by @minchao-sun in #5095
- optimize clip_grad_norm_ function by @mmhab in #4915
- [xs] fix ZEROPP convergence test by @yundai424 in #5061
- Switch hasattr check from compile to compiler by @loadams in #5096
- Split is_synchronized_device api to multiple apis by @BacharL in #5026
- 47% FastGen speedup for low workload - refactor allocator by @HeyangQin in #5090
- Support
exclude_frozen_parameters
forzero_to_fp32.py
script by @andstor in #4979 - Fix alignment of optimizer states when loading by @tohtana in #5105
- Skip Triton import for AMD by @lekurile in #5110
- Add HIP conversion file outputs to .gitignore by @lekurile in #5111
- Remove optimizer step on initialization by @tohtana in #5104
New Contributors
- @ByronHsu made their first contribution in #5011
- @RobinDong made their first contribution in #4985
- @oushu1zhangxiangxuan1 made their first contribution in #4996
- @yundai424 made their first contribution in #5048
- @segyges made their first contribution in #5072
- @andstor made their first contribution in #4979
Full Changelog: v0.13.1...v0.13.2