-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VideoSwin Backbone #1808
base: keras-hub
Are you sure you want to change the base?
VideoSwin Backbone #1808
Conversation
* Agg Vgg16 backbone * update names * update tests * update test * add image classifier * incorporate review comments * Update test case * update backbone test * add image classifier * classifier cleanup * code reformat * add vgg16 image classifier * make vgg generic * update doc string * update docstring * add classifier test * update tests * update docstring * address review comments * code reformat * update the configs * address review comments * fix task saved model test * update init * code reformatted
* Add ResNetV1 and ResNetV2 * Address comments
* Add CSP DarkNet * Add CSP DarkNet * snake_case function names * change use_depthwise to block_type
…Backbone` (keras-team#1769) * Add FeaturePyramidBackbone and update ResNetBackbone * Simplify the implementation * Fix CI * Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone * Add conversion implementation * Update docstrings * Address comments
* Add DenseNet * fix testcase * address comments * nit * fix lint errors * move description
The input variable names are not fixed yet. For image classifier backbones |
Also, in the classifier I have used imageclassifier, which needs to be updated to video classifier |
We can ignore nightly failure, but please for |
Any way is fine, but it might be easiest to start with the backbone, and just add the task as a follow up. Can you include some colab showing that we have correct numerics? E.g. load some reference implementation, assign the weights over, show our implementation matches? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial comments. Make sure to take time to check for the style of things in the repo and do some rewriting of the source code as we bring it over. We want to make sure this is a proper port that leaves us with a uniform UX for all models we have checked in.
include_rescaling=False, | ||
image_shape=(32, 224, 224, 3), | ||
embed_dim=96, | ||
patch_size=[2, 4, 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably call the arguments that vary per layer either layerwise_patch_size
or stackwise_patch_size
, etc, for consistency with other models.
def __init__( | ||
self, | ||
include_rescaling=False, | ||
image_shape=(32, 224, 224, 3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this model is capable of taking in different numbers of frames or different image sizes, this should probably be (None, None, None, 3)
.
|
||
def __init__( | ||
self, | ||
include_rescaling=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove defaults that are specific to a given model size. patch_size
, window_size
, depths
and num_heads
should all have no default here. The defaults will be in the individual preset configs.
): | ||
|
||
# === Functional Model === | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty newline
image_shape (tuple[int], optional): The size of the input video in | ||
`(depth, height, width, channel)` format. | ||
Defaults to `(32, 224, 224, 3)`. | ||
include_rescaling (bool, optional): Whether to rescale the inputs. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format lines to 80 characters. This looks like we are going over.
self.embed_dim = embed_dim | ||
self.norm_layer = norm_layer | ||
|
||
def __compute_padding(self, dim, patch_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no one underscore? Double underscore is usually reserved for python builtins
**kwargs, | ||
): | ||
super().__init__(**kwargs) | ||
# variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config? variables usually refers to trainable weights (these are not)
**kwargs, | ||
): | ||
super().__init__(**kwargs) | ||
# variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, these aren'te variables
super().__init__(**kwargs) | ||
self.output_dim = output_dim | ||
self.hidden_dim = hidden_dim | ||
self._activation_identifier = activation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this in a few places, we should not do it.
in init
self.activation = keras.activations.get(activation)
in get_config
"activation": keras.activations.serialize(self.activation),
patch_size=(4, 4, 4), embed_dim=96 | ||
) | ||
config = patch_embedding_model.get_config() | ||
assert isinstance(config, dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite all of these tests to match the repo style. We do not use assert
anywhere.
753047d
to
a5e5d8f
Compare
No description provided.