Releases: cleanlab/cleanvision
v0.3.6 Improved issue type odd_size, minor bug fix and version updates
- Odd size issue
We use the IQR method now instead of a hard threshold to detect odd sized images compared to the rest of the dataset. According to this, an image is marked as odd sized ifsize > q1 + 3 * IQR
orsize < q3 - 3 * IQR
, where q1 and q3 are the 25th and 75th percentiles respectively. - Statistics
imagelab.info['statistics']
is now updated to provide key statistics like mean, std, min, max, 25%, 50%, and 75% for all the image properties being computed while looking for issues. - Bug fix
Image was being resized to zero width/height for blurry issue check, in cases where aspect ratio was unusual. - CI pipeline
Version updates for black, flake8 docs requirements and datasets library.
Added cron schedule for running tests.
Related PRs
- Bumped up version to 0.3.5 by @sanjanag in #244
- HF datasets version fix by @sanjanag in #249
- Run CI pipeline on a cron schedule by @sanjanag in #248
- Added a min size of 1 to avoid zero width/length in resize by @sanjanag in #250
- Odd size revamp by @sanjanag in #247
- Links update by @sanjanag in #251
Full Changelog: v0.3.5...v0.3.6
v0.3.5 Improved documentation, enhanced testing, and codebase refinements
- Improved README, added FAQ page and updated Development guide with instructions for building docs.
- Added tests for truncating titles in visualization, updated dev requirements, and fixed type checking issue.
- Added code for raising exception on receiving duplicate files in
filepaths
argument. - Added a PR template
Detailed changes
- Update version by @sanjanag in #220
- Update README text by @jwmueller in #221
- Added missing requirements for dev by @sanjanag in #225
- Added tests for truncate_titles() by @aenlemmea in #232
- Removed type ignore according to type check by @sanjanag in #234
- modify python source code example by @developer0hye in #233
- Added faq page by @sanjanag in #235
- Added instructions for building docs locally by @sanjanag in #238
- improve formatting of the quickstart tutorial by @jwmueller in #240
- Simple Clean up, .gitignore improved, PR template added by @smttsp in #227
- Raise an exception on receiving duplicate filepaths by @sanjanag in #242
New Contributors
- @aenlemmea made their first contribution in #232
- @developer0hye made their first contribution in #233
- @smttsp made their first contribution in #227
Full Changelog: v0.3.4...v0.3.5
v0.3.4 Improved time taken to find issues in large datasets.
Improved time taken to find issues in large datasets.
What's Changed
- Version bump to 0.3.4 by @sanjanag in #218
- Fixed a time consuming line in odd size image property by @sanjanag in #217
- Update docs sidebar by @jwmueller in #219
Full Changelog: v0.3.3...v0.3.4
v.0.3.3 Added support for cloud datasets
Added support for running cleanvision on datasets residing in cloud, AWS, Google storage and Azure storage.
What's Changed
- Bumped up patch version by @sanjanag in #212
- Commented bump version job by @sanjanag in #213
- Adding support for fsspec datasets by @LemurPwned in #176
- docs: put tutorials above API reference in sidebar by @jwmueller in #216
New Contributors
- @LemurPwned made their first contribution in #176
Full Changelog: v0.3.2...v0.3.3
v0.3.2 Visualization can also show an ID of the image along with score of the image
ID of the image can be shown in visualization of issues in report. This functionality is added to make the cleanvision integration in cleanlab more seamless.
Detailed changes
- Bumped patch version by @sanjanag in #208
- Added title info to visualization with optional show_id arg by @sanjanag in #211
Full Changelog: v0.3.1...v0.3.2
v0.3.1 Added a new issue check, improvements in visualization and support for integration in cleanlab
What's Changed
- Added a new issue check
odd_size
for detecting images that are too small or too large in area relative to the dataset
- Added support for cleanvision integration in cleanlab repo. This will enable checking for image issues from cleanlab package as well.
- Long image titles in visualization will be truncated in visualization based on longest common prefix/suffix
- Supported more hash types in
near_duplicates
issue check
Detailed changes
- Version bump to 0.3.1 by @sanjanag in #189
- Updated links by @sanjanag in #190
- Update dataset by @sanjanag in #191
- Feat size: Added detection for images which are way bigger than average. by @wirthual in #175
- Update odd size 1-line description by @jwmueller in #196
- Truncate long titles in plots by @manulpatel in #181
- Directing discussions to slack by @sanjanag in #199
- Added link checker workflow by @manulpatel in #198
- expand list of supported file formats to consider more cases by @jwmueller in #200
- Datalab integration by @sanjanag in #166
- Testing for windows by @sanjanag in #201
- Add ahash, dhash and chash to near_duplicates issue by @bluelul in #203
- Updated development guide with formatting and styling commands by @sanjanag in #204
New Contributors
Full Changelog: v0.3.0...v0.3.1
v.0.3.0 Major improvements in dark and blurry issue types, scoring for duplicate issues
Improvement in blurry check
Improved the blurry check logic to produce lesser false positives and catch blurry images which were left unidentified earlier in the dataset.
Here are some examples of images that were falsely identified as blurry previously
Here are some examples of blurry images that were discovered after improvement
Improvement in dark check
Images that were previously falsely identified as dark
Scoring for near and exact duplicate issue types
Introduced scores for near and exact duplicate checks. The score of an image is inversely proportional to the number of images identified as its duplicate. Here's an example of what the scores look like. Here top 3 images are exact duplicates of each other and so on.
Changelog
- Adds test for update_df from utils by @Kadam-Tushar in #165
- Incremented version by @sanjanag in #164
- Added worfklow for automatic release by @sanjanag in #162
- make package pep561 compatible. by @wirthual in #170
- Added save load functionality by @sanjanag in #168
- Minor updates by @sanjanag in #177
- Added flake8 to pre-commit hooks by @manulpatel in #180
- Fixed missing mypy stub error by @sanjanag in #186
- Updated readthedocs config and requirements by @sanjanag in #187
- Updated threshold for dark issue by @sanjanag in #179
- Added score for duplicate images by @sanjanag in #183
- Blurry by @sanjanag in #185
- Updated minor version by @sanjanag in #188
- Changed import for Imagelab by @sanjanag in #182
New Contributors
- @wirthual made their first contribution in #170
- @manulpatel made their first contribution in #180
Full Changelog: v0.2.1...v0.3.0
v.0.2.1 Updated cleanvision for torch datasets, pre-commit hooks and documentation
What's Changed
- Updated version to 0.2.1 by @sanjanag in #154
- Updated torch n_jobs=1 by @sanjanag in #156
- pre-commit hook for clearing notebooks output and metadata by @Kadam-Tushar in #161
- allow cleanvision.version to work by @jwmueller in #160
- Added notebooks to documentation by @sanjanag in #158
- docs: fixed example script link in readme by @sanjanag in #163
New Contributors
- @Kadam-Tushar made their first contribution in #161
Full Changelog: v0.2.0...v0.2.1
v.0.2.0 -- Added support for HuggingFace and Torchvision datasets
What's Changed
Added support for running CleanVision on HuggingFace and torchvision datasets.
v.0.1.1 -- Bugfix - Images loaded twice in Windows OS
What's Changed
- Fixed bug where images were loaded twice in Windows OS caused by
glob.glob()
's different behavior in different OSs. #143