Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depth anything v2 workflow block #875

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

reiffd7
Copy link
Contributor

@reiffd7 reiffd7 commented Dec 12, 2024

Description

This PR implements the Depth Anything V2 model integration as a workflow block, enabling depth map prediction from 2D images. The implementation provides:

  • Support for three model sizes (Small, Base, Large) with different performance characteristics
  • Multiple colormap visualization options
  • Normalized depth output for technical applications
  • Comprehensive documentation and user guidance

Dependencies:

  • transformers
  • PIL
  • numpy
  • matplotlib
  • supervision

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

The implementation has been tested with:

  1. Different input image formats (RGB images via PIL/numpy)
  2. All supported model sizes (Small, Base, Large)
  3. All available colormap options (Spectral_r, viridis, plasma, magma, inferno)
  4. Edge cases:
    • Handling images with uniform depth
    • Proper normalization of depth values
    • Error handling for invalid inputs

Any specific deployment considerations

  1. Model weights will be downloaded on first use from HuggingFace
  2. Different model sizes have varying resource requirements:
    • Small: ~25M parameters, ~60ms inference time
    • Base: ~335M parameters, ~213ms inference time
    • Large: ~891M parameters, ~5.2s inference time
  3. GPU acceleration recommended for optimal performance

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

  • Docs updated? What were the changes:
    • Added comprehensive block documentation in LONG_DESCRIPTION
    • Included detailed model size descriptions and performance characteristics
    • Added colormap options with visual descriptions
    • Provided usage examples and application scenarios
    • Documented all input parameters and outputs

@PawelPeczek-Roboflow
Copy link
Collaborator

  • transformers is installed as extras - I bet the contribution would make inference python package failing (triggered by import of blocks loader - not visible in CI, would be shown if you extend Test package install - inference with import of inference/core/workflows/core_steps/loader.py module
  • the approach for model loading would be extremally slow - what you have would make the model loading over and over again for consecutive request to server, would work ok in InferencePipeline

until the two above are clarified I will not approve the PR

@reiffd7
Copy link
Contributor Author

reiffd7 commented Dec 13, 2024

  • transformers is installed as extras - I bet the contribution would make inference python package failing (triggered by import of blocks loader - not visible in CI, would be shown if you extend Test package install - inference with import of inference/core/workflows/core_steps/loader.py module
  • the approach for model loading would be extremally slow - what you have would make the model loading over and over again for consecutive request to server, would work ok in InferencePipeline

until the two above are clarified I will not approve the PR

For first bullet point, can 'transformers' become a core dependency or is it too large to be included for just having one block? I can foresee more huggingface wrapper workflow blocks being built.

For the second bullet, we could set up a model cache within the block. This could keep the model in memory between instances. I'm not sure how well that would handle on a multi-process server. I was thinking that we could use the Huggingface Inference API to post an image and get a prediction returned. That would be much more lightweight. On the model card, https://huggingface.co/depth-anything/Depth-Anything-V2-Base-hf, I read that "This model does not have enough activity to be deployed to Inference API (serverless) yet. ncrease its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead." Thats a bummer, it has 34,918 downloads, would have to think it will hosted on the serverless API soon. I'm also now just discovering ModelManager. How do we host CLIP, Florence 2? I would imagine thats a similar situation where loading these models would be extremely slow.

@PawelPeczek-Roboflow
Copy link
Collaborator

the point with transformers as core dependency is not that obvious - my concern is around our very specific pins:
image
and the general trend that we can see in the issues - people do not like strict pins on popular libs as they want those to be always up to date - which is not possible when we have tight pins in requirements.

So - the problem is much more convoluted than moving the dependency around - you can take that hustle and try it out, potentially unleashing pins taking care that old things still work - but that's the hustle you need to take at the moment as I do not have bandwidth.

A quick solution would be to have import transformers in try-catch at block init and fail when someone attempts to run the block in env that do not have dependencies - with clear message what is wrong. That's not ideal, magical imports are something I personally hate, but we could live with that solution.

On the second bullet - the code you create would be unbearably slow to handle consecutive requests in inference server and I am almost sure that if I approve that, any complaints regarding speed would need to be addressed by me. We can also live with that, but effectively you commit to fixing future issues if they are raised - and also - if any marketing material needs to be created on top of that which would not be possible due to the fact it's to slow - that would also be on you to fix.

@hansent
Copy link
Contributor

hansent commented Jan 16, 2025

is this still active? would like to close if its stale

@reiffd7
Copy link
Contributor Author

reiffd7 commented Jan 16, 2025

Yes its still active. I'm going to get back to this soon. I want to reimplement without huggingface for better performance. Depth anything has come up in a bunch of use cases so its still really important for us to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants