Releases · cleanlab/cleanvision

13 Feb 19:00

sanjanag

v0.3.6

8d8ffaf

v0.3.6 Improved issue type odd_size, minor bug fix and version updates Latest

Latest

Odd size issue
We use the IQR method now instead of a hard threshold to detect odd sized images compared to the rest of the dataset. According to this, an image is marked as odd sized if size > q1 + 3 * IQR or size < q3 - 3 * IQR, where q1 and q3 are the 25th and 75th percentiles respectively.
Statistics
imagelab.info['statistics'] is now updated to provide key statistics like mean, std, min, max, 25%, 50%, and 75% for all the image properties being computed while looking for issues.
Bug fix
Image was being resized to zero width/height for blurry issue check, in cases where aspect ratio was unusual.
CI pipeline
Version updates for black, flake8 docs requirements and datasets library.
Added cron schedule for running tests.

Related PRs

Bumped up version to 0.3.5 by @sanjanag in #244
HF datasets version fix by @sanjanag in #249
Run CI pipeline on a cron schedule by @sanjanag in #248
Added a min size of 1 to avoid zero width/length in resize by @sanjanag in #250
Odd size revamp by @sanjanag in #247
Links update by @sanjanag in #251

Full Changelog: v0.3.5...v0.3.6

Contributors

sanjanag

Assets 2

30 Nov 15:38

sanjanag

v0.3.5

d1334e0

v0.3.5 Improved documentation, enhanced testing, and codebase refinements

Improved README, added FAQ page and updated Development guide with instructions for building docs.
Added tests for truncating titles in visualization, updated dev requirements, and fixed type checking issue.
Added code for raising exception on receiving duplicate files in filepaths argument.
Added a PR template

Detailed changes

Update version by @sanjanag in #220
Update README text by @jwmueller in #221
Added missing requirements for dev by @sanjanag in #225
Added tests for truncate_titles() by @aenlemmea in #232
Removed type ignore according to type check by @sanjanag in #234
modify python source code example by @developer0hye in #233
Added faq page by @sanjanag in #235
Added instructions for building docs locally by @sanjanag in #238
improve formatting of the quickstart tutorial by @jwmueller in #240
Simple Clean up, .gitignore improved, PR template added by @smttsp in #227
Raise an exception on receiving duplicate filepaths by @sanjanag in #242

New Contributors

@aenlemmea made their first contribution in #232
@developer0hye made their first contribution in #233
@smttsp made their first contribution in #227

Full Changelog: v0.3.4...v0.3.5

Contributors

jwmueller, smttsp, and 3 other contributors

Assets 2

01 Sep 11:12

sanjanag

v0.3.4

723abc7

v0.3.4 Improved time taken to find issues in large datasets.

Improved time taken to find issues in large datasets.

What's Changed

Version bump to 0.3.4 by @sanjanag in #218
Fixed a time consuming line in odd size image property by @sanjanag in #217
Update docs sidebar by @jwmueller in #219

Full Changelog: v0.3.3...v0.3.4

Contributors

jwmueller and sanjanag

Assets 2

09 Aug 06:00

sanjanag

v0.3.3

6ba83ca

v.0.3.3 Added support for cloud datasets

Added support for running cleanvision on datasets residing in cloud, AWS, Google storage and Azure storage.

What's Changed

Bumped up patch version by @sanjanag in #212
Commented bump version job by @sanjanag in #213
Adding support for fsspec datasets by @LemurPwned in #176
docs: put tutorials above API reference in sidebar by @jwmueller in #216

New Contributors

@LemurPwned made their first contribution in #176

Full Changelog: v0.3.2...v0.3.3

Contributors

jwmueller, sanjanag, and LemurPwned

Assets 2

17 Jul 03:21

sanjanag

v0.3.2

ff59d69

v0.3.2 Visualization can also show an ID of the image along with score of the image

ID of the image can be shown in visualization of issues in report. This functionality is added to make the cleanvision integration in cleanlab more seamless.

Detailed changes

Bumped patch version by @sanjanag in #208
Added title info to visualization with optional show_id arg by @sanjanag in #211

Full Changelog: v0.3.1...v0.3.2

Contributors

sanjanag

Assets 2

07 Jul 19:56

sanjanag

v0.3.1

3219e79

v0.3.1 Added a new issue check, improvements in visualization and support for integration in cleanlab

What's Changed

Added a new issue check odd_size for detecting images that are too small or too large in area relative to the dataset

Added support for cleanvision integration in cleanlab repo. This will enable checking for image issues from cleanlab package as well.
Long image titles in visualization will be truncated in visualization based on longest common prefix/suffix
Supported more hash types in near_duplicates issue check

Detailed changes

Version bump to 0.3.1 by @sanjanag in #189
Updated links by @sanjanag in #190
Update dataset by @sanjanag in #191
Feat size: Added detection for images which are way bigger than average. by @wirthual in #175
Update odd size 1-line description by @jwmueller in #196
Truncate long titles in plots by @manulpatel in #181
Directing discussions to slack by @sanjanag in #199
Added link checker workflow by @manulpatel in #198
expand list of supported file formats to consider more cases by @jwmueller in #200
Datalab integration by @sanjanag in #166
Testing for windows by @sanjanag in #201
Add ahash, dhash and chash to near_duplicates issue by @bluelul in #203
Updated development guide with formatting and styling commands by @sanjanag in #204

New Contributors

@bluelul made their first contribution in #203

Full Changelog: v0.3.0...v0.3.1

Contributors

jwmueller, wirthual, and 3 other contributors

Assets 2

24 May 22:25

sanjanag

v0.3.0

c332a25

v.0.3.0 Major improvements in dark and blurry issue types, scoring for duplicate issues

Improvement in blurry check
Improved the blurry check logic to produce lesser false positives and catch blurry images which were left unidentified earlier in the dataset.

Here are some examples of images that were falsely identified as blurry previously

Here are some examples of blurry images that were discovered after improvement

Improvement in dark check
Images that were previously falsely identified as dark

Scoring for near and exact duplicate issue types
Introduced scores for near and exact duplicate checks. The score of an image is inversely proportional to the number of images identified as its duplicate. Here's an example of what the scores look like. Here top 3 images are exact duplicates of each other and so on.

Changelog

Adds test for update_df from utils by @Kadam-Tushar in #165
Incremented version by @sanjanag in #164
Added worfklow for automatic release by @sanjanag in #162
make package pep561 compatible. by @wirthual in #170
Added save load functionality by @sanjanag in #168
Minor updates by @sanjanag in #177
Added flake8 to pre-commit hooks by @manulpatel in #180
Fixed missing mypy stub error by @sanjanag in #186
Updated readthedocs config and requirements by @sanjanag in #187
Updated threshold for dark issue by @sanjanag in #179
Added score for duplicate images by @sanjanag in #183
Blurry by @sanjanag in #185
Updated minor version by @sanjanag in #188
Changed import for Imagelab by @sanjanag in #182

New Contributors

@wirthual made their first contribution in #170
@manulpatel made their first contribution in #180

Full Changelog: v0.2.1...v0.3.0

Contributors

wirthual, sanjanag, and 2 other contributors

Assets 2

19 Apr 21:55

sanjanag

v0.2.1

376ecfb

v.0.2.1 Updated cleanvision for torch datasets, pre-commit hooks and documentation

What's Changed

Updated version to 0.2.1 by @sanjanag in #154
Updated torch n_jobs=1 by @sanjanag in #156
pre-commit hook for clearing notebooks output and metadata by @Kadam-Tushar in #161
allow cleanvision.version to work by @jwmueller in #160
Added notebooks to documentation by @sanjanag in #158
docs: fixed example script link in readme by @sanjanag in #163

New Contributors

@Kadam-Tushar made their first contribution in #161

Full Changelog: v0.2.0...v0.2.1

Contributors

jwmueller, sanjanag, and Kadam-Tushar

Assets 2

11 Apr 03:31

sanjanag

v0.2.0

bab8f3a

v.0.2.0 -- Added support for HuggingFace and Torchvision datasets

What's Changed

Added support for running CleanVision on HuggingFace and torchvision datasets.

Assets 2

31 Mar 18:37

sanjanag

v0.1.1

6335370

v.0.1.1 -- Bugfix - Images loaded twice in Windows OS

What's Changed

Fixed bug where images were loaded twice in Windows OS caused by glob.glob()'s different behavior in different OSs. #143

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Related PRs

Contributors

Detailed changes

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Detailed changes

Contributors

What's Changed

Detailed changes

New Contributors

Contributors

Changelog

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

What's Changed

Releases: cleanlab/cleanvision

v0.3.6 Improved issue type odd_size, minor bug fix and version updates

Related PRs

Contributors

v0.3.5 Improved documentation, enhanced testing, and codebase refinements

Detailed changes

New Contributors

Contributors

v0.3.4 Improved time taken to find issues in large datasets.

What's Changed

Contributors

v.0.3.3 Added support for cloud datasets

What's Changed

New Contributors

Contributors

v0.3.2 Visualization can also show an ID of the image along with score of the image

Detailed changes

Contributors

v0.3.1 Added a new issue check, improvements in visualization and support for integration in cleanlab

What's Changed

Detailed changes

New Contributors

Contributors

v.0.3.0 Major improvements in dark and blurry issue types, scoring for duplicate issues

Changelog

New Contributors

Contributors

v.0.2.1 Updated cleanvision for torch datasets, pre-commit hooks and documentation

What's Changed

New Contributors

Contributors

v.0.2.0 -- Added support for HuggingFace and Torchvision datasets

What's Changed

v.0.1.1 -- Bugfix - Images loaded twice in Windows OS

What's Changed