Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop deprecated kube-rbac-proxy #479

Merged
merged 1 commit into from
Jan 23, 2025

Conversation

mythi
Copy link
Contributor

@mythi mythi commented Jan 3, 2025

Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mythi
Copy link
Contributor Author

mythi commented Jan 7, 2025

#466 needs to be merged first since my preference is to have that PR to update the dependencies (and then I'd drop 91656ed here)

@mythi mythi force-pushed the drop-kube-rbac-proxy branch 2 times, most recently from a15c8d5 to ae7a56d Compare January 8, 2025 15:30
Copy link
Member

@fidencio fidencio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @mythi!

@mythi
Copy link
Contributor Author

mythi commented Jan 9, 2025

The remaining failure seems to be because of

Warning  Evicted    5m3s  kubelet            The node had condition: [DiskPressure].

@mythi
Copy link
Contributor Author

mythi commented Jan 13, 2025

I believe this hits #481 now. @ldoktor is there a way to recover from this corruption?

@ldoktor
Copy link
Contributor

ldoktor commented Jan 13, 2025

Hello @mythi, one way is to find out what stage the uninstall is in and recover the config based on that. Usually it suffices to just remove imports that are gone from /etc/containerd/config.yaml.d/XXX in config.yaml (imports=[]) and restart containerd (unless it manages to restart automatically :D) and it should proceed.

Anyway I'm playing with this issue, I have one little workaround that improves things (kata-deploy and pre-reqs in operator) but I'm trying to find out the root cause first (which is proving to be more complex than I thought). I can submit the workarounds if you think it's worth it but I'd rather spend more time on the actual fix first (basically the kata-deploy=cleanup is set immediately even when I use sleep 2m in kata-deploy before the labeling, trying to find out why as we rely on this label in operator cleanup to start post-cleanup).

@mythi
Copy link
Contributor Author

mythi commented Jan 13, 2025

Usually it suffices to just remove imports that are gone from /etc/containerd/config.yaml.d/XXX in `config.yaml

OK, I've not followed the work done in this space. It used to be so that imports=[] was not used and we "patched" the main config.yaml for the runtimeclass handlers.

My main open/concern is if the failure is related to my PR or flaky tests since it'd be important to have this change for the next release.

@wainersm
Copy link
Member

IIUC this PR is depending on #483 to get CI passing.

@mythi
Copy link
Contributor Author

mythi commented Jan 22, 2025

IIUC this PR is depending on #483 to get CI passing.

Correct. I stopped restarting the failing jobs after @ldoktor started working on the improvements.

kube-rbac-proxy was historically used to protect the metrics endpoint.
However, its usage has been discontinued in Kubebuilder. The default
scaffold now leverages the WithAuthenticationAndAuthorization feature
provided by controller-runtime.

This feature provides integrated support for securing metrics endpoints
by embedding authentication (authn) and authorization (authz) mechanisms
directly into the controller manager's metrics server, replacing the
need for kube-rbac-proxy to secure metrics endpoints.

Upgrade CoCo operator manually to follow the latest scaffolding
for WithAuthenticationAndAuthorization.

Signed-off-by: Mikko Ylinen <[email protected]>
@mythi mythi force-pushed the drop-kube-rbac-proxy branch from ae7a56d to 9db2021 Compare January 23, 2025 14:14
@fidencio fidencio merged commit e42d13e into confidential-containers:main Jan 23, 2025
18 checks passed
value: "--metrics-bind-address=:8443"
- op: add
path: /spec/template/spec/containers/0/args/0
value: "--metrics-secure"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mythi I'm using DISABLECVM="true" on v1.30.6 eks cluster and the usual way to deploy CAA failed to deploy the controller-manager, in logs it complains about this argument:

oc logs -n confidential-containers-system pods/cc-operator-controller-manager-78454cf6d8-qj9xg 
flag provided but not defined: -metrics-secure
Usage of /manager:
  -cc-runtime-namespace string
        The namespace where CcRuntime secondary resources are created (default "kube-system")
  -health-probe-bind-address string
        The address the probe endpoint binds to. (default ":8081")
  -kubeconfig string
        Paths to a kubeconfig. Only required if out-of-cluster.
  -leader-elect
        Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
  -metrics-bind-address string
        The address the metric endpoint binds to. (default ":8080")
  -peer-pods
        Enable Peerpod controllers.
  -zap-devel
        Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error) (default true)
  -zap-encoder value
        Zap log encoding (one of 'json' or 'console')
  -zap-log-level value
        Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
  -zap-stacktrace-level value
        Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
  -zap-time-encoding value
        Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.

Removing it helps but my question is are there any new requirements tied to this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note deploying it via: CLOUD_PROVIDER=aws make deploy from cloud-api-adapter...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the problem is that it uses quay.io/confidential-containers/operator:v0.10.0 image by default, which doesn't supports this (and another problem is it uses different labels resulting in kata not being installed :-/) I'll have to figure-out why and if it's a default or something I set. Because if it's default it needs to be adjusted in order to work out of the box...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, looking at the history of https://github.com/confidential-containers/operator/blob/main/config/manager/kustomization.yaml I guess we forgot to update and re-bump to latest a couple of times... @wainersm @fitzthum @fidencio what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #488

config/release looks to be the stable deployment with the pinned release versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so let's hope it'll be updated soon as currently the latest CAA fails to install due to change of labels and the support for --metrics-secure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants