Major Features and Improvements
Train/Eval/Export
- Support train/eval/export on cpu #27
- Support TRT export (Beta) #30 #32 #41 #43 #58 #59 #89
- Support AOT export (WIP) #79
Model
- Optimize TDM gen tree speed #33
- TDM Support string id #72
- Rank and Match models support sample weight #50 #57 #63 #65
- Add zero collision hash embedding #60
- Add intervention methods for multi-target learning #49
- Add Autodis and MLP embedding for raw features #73 #75
- Add task space for multi-target learning loss #82
- Add dual augmented two-tower match model #83
- Add HSTU (WIP) #55
Feature
- pyfg support CPU without avx512 #20
- ExprFeature support l2_norm|dot|euclid_dist #35
- Add fg bucketize only mode & refactor fg_encoded to fg_mode #62
- Make default bucketize value configurable #94
- Support multi-value sequence #96
- Support vocab file #97
Dataset
- Enhance stability for credential of OdpsDataset #45
- Add complex type and credential support for sampler when use odps dataset #52
- Support CsvDataset with null columns #56
- Negative sampler support string id #70
Config
Upgrade
Note
For TorchEasyRec 0.7.x, you should use Docker image version 0.7.
- For the GPU version (CUDA 12.4):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.7-cu124
- PyTorch: v2.6 CUDA: v12.4 FBGEMM: v1.1.0 TorchRec: v1.1.0 Python: v3.11
- For the CPU version:
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.7-cpu
- PyTorch: v2.6 FBGEMM: v1.1.0 TorchRec: v1.1.0 Python: v3.11
Bug Fixes and Other Changes
- [bugfix] remove redundant sequence key in feature input names when fg_mode is DAG by @tiankongdeguiji in #21
- fix quota_name for add feature info by @chengaofei in #22
- update config delete drop feature config by @chengaofei in #23
- [feat] make docker compat with gpu driver 470 by @tiankongdeguiji in #24
- [bugfix] fix dlc tutorial doc by @tiankongdeguiji in #25
- [bugfix] fix dbmtl model doc by @tiankongdeguiji in #28
- [feat] add pai dlc and dsw dependency in docker by @tiankongdeguiji in #29
- [feat] update easyrec dinggroup qrcode by @tiankongdeguiji in #31
- [feat] update pyfg doc to 0.3.5 by @tiankongdeguiji in #34
- [bugfix] fix fg arrow handler with sample mask by @tiankongdeguiji in #38
- [feat] add unique test work dir by @tiankongdeguiji in #40
- [bugfix] add id field of negative sampler to selected columns by @tiankongdeguiji in #42
- [bugfix] prevent predict hang when subthread or subproc exception by @tiankongdeguiji in #44
- [bugfix] input_tile=3: make dataparser to get user feats before creat… by @yjjinjie in #46
- [bugfix] fix sequence feature doc by @tiankongdeguiji in #48
- [feat] optimize is_user_feat of Feature when use dag by @tiankongdeguiji in #53
- [bugfix] refine sample weight compatibility & refine label dtype check & relax predict pipeline check & fix num_rows < num_workers when use OdpsDataset by @tiankongdeguiji in #54
- [feat] add doc for training with maxcompute tables on DLC by @yanzhen1233 in #47
- create fg will use resource name by @chengaofei in #64
- [bugfix] fix is_sparse of LookupFeature and MatchFeature when use vocab_dict by @tiankongdeguiji in #66
- [bugfix] fix odps quota in hitrate.py & refine error info of CsvReader and ParquetReader by @tiankongdeguiji in #67
- [bugfix] fix mtl model label in ut by @tiankongdeguiji in #68
- [bugfix] fix calculate_shard_storages to handle optimizer correctly by @tiankongdeguiji in #69
- [feat] add LOG_LEVEL environ variable by @tiankongdeguiji in #71
- [bugfix] fix predict when num_workers = 0 by @tiankongdeguiji in #74
- [bugfix] fix duplicate server launch error in odps sampler test by @tiankongdeguiji in #76
- [feat] refactor batch_size to tile_size in Batch dataclass by @tiankongdeguiji in #77
- [feat]add total_loss to the plogger and summary_writer by @eric-gecheng in #78
- [bugfix] fix weighted feature when INPUT_TILE=2 by @tiankongdeguiji in #80
- [bugfix] fix negative sample table with multiple partitions by @tiankongdeguiji in #81
- [bugfix] readme typo by @eric-gecheng in #85
- [doc] fix task space doc error by @chengaofei in #86
- [bugfix] add div_no_nan and prevent divide by zero loss weight by @tiankongdeguiji in #88
- [feat] remove sample weight and labels when export by @tiankongdeguiji in #91
- [feat] configure the shell to be bash by default in docker environments by @tiankongdeguiji in #92
- [doc] creat fg json doc add upload fg json to mc method by @chengaofei in #95
New Contributors
- @yjjinjie made their first contribution in #30
- @eric-gecheng made their first contribution in #50
- @yanzhen1233 made their first contribution in #47
- @Dave-AdamsWANG made their first contribution in #49
- @chengmengli06 made their first contribution in #79
- @iWelkin-coder made their first contribution in #55
Full Changelog: v0.6.0...v0.7.0