Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Lazy Loaders #1536

Merged
merged 21 commits into from
Feb 10, 2025
Merged

Use Lazy Loaders #1536

merged 21 commits into from
Feb 10, 2025

Conversation

dafnapension
Copy link
Collaborator

@dafnapension dafnapension commented Jan 21, 2025

And not just in order to learn the split names therefrom (at the beginning of Benchmark, for example).
Also, avoid loading splits that are ignored altogether (like stream "test" in src/unitxt/catalog/splitters/small_no_test.json)

@dafnapension dafnapension force-pushed the lazy_loadHF branch 9 times, most recently from c8dbded to 4cf6a69 Compare January 23, 2025 10:43
@dafnapension dafnapension changed the title Procrastinater loadHF Procrastinating loaders - load the data only when actually needs to stream through Unitxt Jan 23, 2025
@dafnapension dafnapension force-pushed the lazy_loadHF branch 4 times, most recently from 20834fa to 78d0927 Compare January 23, 2025 16:43
@elronbandel elronbandel changed the title Procrastinating loaders - load the data only when actually needs to stream through Unitxt Cached driven lazy loaders: load the data only when actually needed and keep in cache while in use Jan 26, 2025
@dafnapension dafnapension force-pushed the lazy_loadHF branch 3 times, most recently from e00617f to 134dd22 Compare January 26, 2025 20:40
Signed-off-by: dafnapension <[email protected]>
…ing (that is dome mocking anyhow) and not evaluating (of the mocked results). add trust_remote also to load_dataset_builder

Signed-off-by: dafnapension <[email protected]>
Signed-off-by: dafnapension <[email protected]>
@dafnapension dafnapension force-pushed the lazy_loadHF branch 2 times, most recently from 016faa4 to a4121c7 Compare January 27, 2025 16:09
@elronbandel elronbandel changed the title Cached driven lazy loaders: load the data only when actually needed and keep in cache while in use Use Lazy Loaders Feb 10, 2025
@elronbandel elronbandel merged commit 82a440f into main Feb 10, 2025
5 of 16 checks passed
@elronbandel elronbandel deleted the lazy_loadHF branch February 10, 2025 14:05
oktie pushed a commit that referenced this pull request Feb 11, 2025
* try lazy loadHF first

Signed-off-by: dafnapension <[email protected]>

* reduce benchmark profiling to generating the dataset only. Not inferring (that is dome mocking anyhow) and not evaluating (of the mocked results). add trust_remote also to load_dataset_builder

Signed-off-by: dafnapension <[email protected]>

* try procrastination for load csv too

Signed-off-by: dafnapension <[email protected]>

* added split cache for the generators, and log limit once per data and increase loader cache

Signed-off-by: dafnapension <[email protected]>

* make sklearn loader too - a lazy loader

Signed-off-by: dafnapension <[email protected]>

* adjust to new readers for csv

Signed-off-by: dafnapension <[email protected]>

* Enhance LoadHF class to support optional splits and improve dataset loading logic

Signed-off-by: elronbandel <[email protected]>

* Refactor LoadHF class to improve dataset loading and implement limit on yielded instances

Signed-off-by: elronbandel <[email protected]>

* Refactor LoadHF class to streamline dataset loading and enhance split handling

Signed-off-by: elronbandel <[email protected]>

* Remove unused import and update line number in secrets baseline

Signed-off-by: elronbandel <[email protected]>

* Refactor load_data method to simplify error handling and remove unnecessary cache checks

Signed-off-by: elronbandel <[email protected]>

* Merge origin/main

Signed-off-by: elronbandel <[email protected]>

* Refactor loaders to implement LazyLoader class and update load_iterables method for improved streaming support

Signed-off-by: elronbandel <[email protected]>

* Update exception handling in test_failed_load_csv to catch general exceptions

Signed-off-by: elronbandel <[email protected]>

* Refactor LoadHF class to streamline data loading and enhance error handling

Signed-off-by: elronbandel <[email protected]>

---------

Signed-off-by: dafnapension <[email protected]>
Signed-off-by: elronbandel <[email protected]>
Co-authored-by: Elron Bandel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants