Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knative service status to "Unknown" and "Uninitialized" #15753

Open
hyde404 opened this issue Feb 6, 2025 · 2 comments
Open

Knative service status to "Unknown" and "Uninitialized" #15753

hyde404 opened this issue Feb 6, 2025 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@hyde404
Copy link

hyde404 commented Feb 6, 2025

What version of Knative?

1.17.0
net-istio: 1.17.0
istio: 1.24.2

Expected Behavior

The knative service status should be “Ready” instead of hanging in “Unknown” state.

Actual Behavior

When I deploy a knative service, in a EKS cluster, it remains in “Unknown” status until the istio ingress-controllers are restarted, even if the application can be reached.
It then switches to “Ready”, and the next application deployed will be in “Unknown” status, and so on.

 kubectl get ksvc gsvc-serving-db07943b -n eb7d5189

NAME                    URL                                                    LATESTCREATED                 LATESTREADY                   READY     REASON

gsvc-serving-db07943b   http://test-eb7d5189.serverless-dev.xyz.crashcourse.com   gsvc-serving-db07943b-00001   gsvc-serving-db07943b-00001   Unknown   Uninitialized

The application is exposed with a loadbalancer and reachable

curl https://test-eb7d5189.serverless-dev.xyz.crashcourse.com

Hello World!

Here are the details of the knative service status:

kubectl get ksvc -n eb7d5189 gsvc-serving-db07943b -ojsonpath='{.status}' | jq 
{
  "address": {
    "url": "http://gsvc-serving-db07943b.eb7d5189.svc.cluster.local"
  },
  "conditions": [
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "status": "True",
      "type": "ConfigurationsReady"
    },
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "message": "Waiting for load balancer to be ready",
      "reason": "Uninitialized",
      "status": "Unknown",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "message": "Waiting for load balancer to be ready",
      "reason": "Uninitialized",
      "status": "Unknown",
      "type": "RoutesReady"
    }
  ],
  "latestCreatedRevisionName": "gsvc-serving-db07943b-00001",
  "latestReadyRevisionName": "gsvc-serving-db07943b-00001",
  "observedGeneration": 1,
  "traffic": [
    {
      "latestRevision": true,
      "percent": 100,
      "revisionName": "gsvc-serving-db07943b-00001"
    }
  ],
  "url": "http://test-eb7d5189.serverless-dev.xyz.crashcourse.com"
}

So, for the load-balancing I use an AWS NLB, and everything seems to be ok, all the targets (15021, 443, 80) are healthy.

I also noticed a couple of logs, probably related to the issue.

{
    "severity": "ERROR",
    "timestamp": "2025-02-06T13:25:22.20239663Z",
    "logger": "net-istio-controller.istio-ingress-controller",
    "caller": "status/status.go:421",
    "message": "Probing of https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443 failed, IP: 100.64.174.122:443, ready: false, error: error roundtripping https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443/healthz: read tcp 100.64.162.193:36768->100.64.174.122:443: read: connection reset by peer (depth: 0)",
    "commit": "4dff29e-dirty",
    "knative.dev/controller": "knative.dev.net-istio.pkg.reconciler.ingress.Reconciler",
    "knative.dev/kind": "networking.internal.knative.dev.Ingress",
    "knative.dev/traceid": "de3b3adb-b689-4a4c-b4d4-39c22a0911ba",
    "knative.dev/key": "eb7d5189/gsvc-serving-db07943b",
    "stacktrace": "knative.dev/networking/pkg/status.(*Prober).processWorkItem\n\tknative.dev/[email protected]/pkg/status/status.go:421\nknative.dev/networking/pkg/status.(*Prober).Start.func1\n\tknative.dev/[email protected]/pkg/status/status.go:306"
}

and

{
    "severity": "WARNING",
    "timestamp": "2025-02-06T13:25:22.997317236Z",
    "logger": "controller",
    "caller": "route/reconcile_resources.go:227",
    "message": "Failed to update k8s service",
    "commit": "6265a8e",
    "knative.dev/pod": "controller-85c449cd99-97hgw",
    "knative.dev/controller": "knative.dev.serving.pkg.reconciler.route.Reconciler",
    "knative.dev/kind": "serving.knative.dev.Route",
    "knative.dev/traceid": "e6e9c300-658b-4feb-8b59-e2bf6fa95bd1",
    "knative.dev/key": "eb7d5189/gsvc-serving-db07943b",
    "error": "failed to fetch loadbalancer domain/IP from ingress status"
}

I'd also like to point out that I looked at the route and the ingress, the outputs of which are as follows

route

{
  "address": {
    "url": "http://gsvc-serving-db07943b.eb7d5189.svc.cluster.local"
  },
  "conditions": [
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "status": "True",
      "type": "AllTrafficAssigned"
    },
    {
      "lastTransitionTime": "2025-02-06T15:03:04Z",
      "message": "Certificate route-55cecb6b-26fc-4fda-8e42-1d7d9c8fdd2b is not ready downgrade HTTP.",
      "reason": "HTTPDowngrade",
      "status": "True",
      "type": "CertificateProvisioned"
    },
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "message": "Waiting for load balancer to be ready",
      "reason": "Uninitialized",
      "status": "Unknown",
      "type": "IngressReady"
    },
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "message": "Waiting for load balancer to be ready",
      "reason": "Uninitialized",
      "status": "Unknown",
      "type": "Ready"
    }
  ],
  "observedGeneration": 1,
  "traffic": [
    {
      "latestRevision": true,
      "percent": 100,
      "revisionName": "gsvc-serving-db07943b-00001"
    }
  ],
  "url": "http://test-eb7d5189.serverless-dev.xyz.crashcourse.com"
}

ingress

{
  "conditions": [
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "message": "Waiting for load balancer to be ready",
      "reason": "Uninitialized",
      "status": "Unknown",
      "type": "LoadBalancerReady"
    },
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "status": "True",
      "type": "NetworkConfigured"
    },
    {
      "lastTransitionTime": "2025-02-06T13:43:16Z",
      "message": "Waiting for load balancer to be ready",
      "reason": "Uninitialized",
      "status": "Unknown",
      "type": "Ready"
    }
  ],
  "observedGeneration": 1
}

strange findings

Logs from istiod

2025-02-03T10:57:39.622954Z	info	ads	Push debounce stable[112] 1 for config Secret/eb7d5189/gsvc-pull-2f42fda6-serving-f61c3f70: 100.240948ms since last change, 100.240879ms since last push, full=false

2025-02-03T10:57:39.732479Z	info	model	Incremental push, service gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local at shard Kubernetes/Kubernetes has no endpoints

2025-02-03T10:57:39.756497Z	info	model	Full push, new service eb7d5189/gsvc-serving-db07943b-00001.eb7d5189.svc.cluster.local
2025-02-03T10:57:39.924255Z	info	ads	Push debounce stable[113] 5 for config ServiceEntry/eb7d5189/gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local and 1 more configs: 100.548746ms since last change, 200.632268ms since last push, full=true
        "outbound|443||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
        "outbound|8012||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
        "outbound|8022||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
        "outbound|80||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
        "outbound|9090||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
        "outbound|9091||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {}
2025-02-03T10:58:02.542487Z	info	model	Full push, new service eb7d5189/gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local
2025-02-03T10:58:02.722779Z	info	ads	Push debounce stable[114] 3 for config ServiceEntry/eb7d5189/gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local and 1 more configs: 100.661715ms since last change, 180.224615ms since last push, full=true
2025-02-03T10:58:02.961683Z	info	ads	Push debounce stable[115] 3 for config ServiceEntry/eb7d5189/gsvc-serving-db07943b.eb7d5189.svc.cluster.local and 2 more configs: 100.299834ms since last change, 160.890523ms since last push, full=true

Services before ingress-controller restart

kubectl get svc -n eb7d5189    

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

gsvc-serving-db07943b ExternalName <none> test-eb7d5189.serverless-dev.xyz.crashcourse.com 80/TCP 3h37m

gsvc-serving-db07943b-00001 ClusterIP 172.20.247.29 <none> 80/TCP,443/TCP 3h37m

gsvc-serving-db07943b-00001-private ClusterIP 172.20.132.196 <none> 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP 3h37m

Services after ingress-controller restart

kubectl get svc -n eb7d5189

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

gsvc-serving-db07943b ExternalName <none> knative-local-gateway.istio-system.svc.cluster.local 80/TCP 3h37m

gsvc-serving-db07943b-00001 ClusterIP 172.20.247.29 <none> 80/TCP,443/TCP 3h37m

gsvc-serving-db07943b-00001-private ClusterIP 172.20.132.196 <none> 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP 3h37m

The external-ip from ExternalName turned from test-eb7d5189.serverless-dev.xyz.crashcourse.com to knative-local-gateway.istio-system.svc.cluster.local.

Some tests

From a "Ready" service

curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-5fb72450-00001.eb7d5189.svc.cluster.local

200

curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-5fb72450.eb7d5189.svc.cluster.local

200

From the "Unknown" service

curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-db07943b-00001.eb7d5189.svc.cluster.local

200

curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-db07943b.eb7d5189.svc.cluster.local

404

Steps to Reproduce the Problem

  • Set Ingress controllers (will create aws NLB, so you need a aws loadbalancer controller too)
  • Set knative
  • Deploy a service

Ingress controllers

My setup has some particularities. I use 3 different ingress controllers configured with the helm values as below:

helmCharts:
  - includeCRDs: true
    name: gateway
    namespace: istio-system
    releaseName: istio-ingressgateway
    repo: https://istio-release.storage.googleapis.com/charts
    version: 1.24.2
    valuesInline:
      service:
        type: ClusterIP

  - includeCRDs: true
    name: gateway
    namespace: istio-system
    releaseName: istio-internal-ingressgateway
    repo: https://istio-release.storage.googleapis.com/charts
    version: 1.24.2
    valuesInline:
      service:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
          service.beta.kubernetes.io/aws-load-balancer-scheme: internal
          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
          service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
      podAnnotations:
        proxy.istio.io/config: |
          {
            "gatewayTopology": {
              "proxyProtocol": {}
            }
          }
      labels:
        app: istio-internal-ingressgateway
        istio: ingressgateway

  - includeCRDs: true
    name: gateway
    namespace: istio-system
    releaseName: istio-external-ingressgateway
    repo: https://istio-release.storage.googleapis.com/charts
    version: 1.24.2
    valuesInline:
      service:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
          service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
          service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
      podAnnotations:
        proxy.istio.io/config: |
          {
            "gatewayTopology": {
              "proxyProtocol": {}
            }
          }
      labels:
        app: istio-external-ingressgateway
        istio: ingressgateway

In case you wonder, I use proxy config for matching source IPs and use it in AuthorizationPolicy afterwards.

Knative

I deploy knative using the knative-operator as follow :

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
  annotations:
    gladiator.app/name: knative-operator
spec:
  version: "1.17"
  high-availability:
    replicas: 2
  config:
    istio:
      gateway.knative-serving.knative-external-ingress-gateway: istio-external-ingressgateway.istio-system.svc.cluster.local
      gateway.knative-serving.knative-internal-ingress-gateway: istio-internal-ingressgateway.istio-system.svc.cluster.local
    defaults:
      max-revision-timeout-seconds: "3600"
      revision-timeout-seconds: "1800"
      revision-response-start-timeout-seconds: '600'
    autoscaler:
      allow-zero-initial-scale: "true"
      enable-scale-to-zero: "true"
      initial-scale: "0"
    deployment:
      progress-deadline: "3600s"
    features:
      autodetect-http2: enabled
      kubernetes.containerspec-addcapabilities: disabled
      kubernetes.podspec-affinity: enabled
      kubernetes.podspec-dnsconfig: disabled
      kubernetes.podspec-dnspolicy: disabled
      kubernetes.podspec-dryrun: allowed
      kubernetes.podspec-fieldref: disabled
      kubernetes.podspec-hostaliases: disabled
      kubernetes.podspec-init-containers: enabled
      kubernetes.podspec-nodeselector: enabled
      kubernetes.podspec-persistent-volume-claim: enabled
      kubernetes.podspec-persistent-volume-write: enabled
      kubernetes.podspec-priorityclassname: disabled
      kubernetes.podspec-runtimeclassname: enabled
      kubernetes.podspec-schedulername: disabled
      kubernetes.podspec-securitycontext: enabled
      kubernetes.podspec-tolerations: enabled
      kubernetes.podspec-topologyspreadconstraints: disabled
      kubernetes.podspec-volumes-emptydir: enabled
      kubernetes.podspec-volumes-hostpath: enabled
      multi-container: enabled
      queueproxy.mount-podinfo: disabled
      tag-header-based-routing: disabled
      multi-container-probing: enabled
    gc:
      min-non-active-revisions: "0"
      max-non-active-revisions: "0"
      retain-since-create-time: "disabled"
      retain-since-last-active-time: "disabled"
    leader-election:
      lease-duration: 60s
    logging:
      loglevel.activator: info
      loglevel.autoscaler: info
      loglevel.controller: info
      loglevel.hpaautoscaler: info
      loglevel.net-certmanager-controller: info
      loglevel.net-contour-controller: info
      loglevel.net-istio-controller: info
      loglevel.queueproxy: info
      loglevel.webhook: info
    network:
      auto-tls: Disabled
      domain-template: '{{index .Annotations "service.serverless.xyz.crashcourse.com/hostname"}}.{{.Domain}}'
      ingress-class: "istio.ingress.networking.knative.dev"
    observability:
      logging.enable-probe-request-log: "true"
      logging.enable-request-log: "true"
      logging.request-log-template: >-
        {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js
        .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}",
        "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}",
        "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js
        .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js
        .Request.Referer}}", "latency": {{.Response.Latency}}, "latencyNew":
        {{.Response.Latency}}, "protocol": "{{.Request.Proto}}"}, "traceId":
        "{{index .Request.Header "X-B3-Traceid"}}"}
      metrics.backend-destination: prometheus
      metrics.request-metrics-backend-destination: prometheus
    tracing:
      backend: none

domain-template is linked to an operator we have, so nvm that.

Knative Service

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: gsvc-serving-db07943b
  namespace: eb7d5189
  annotations:
    gladiator/url-prefix: test-
    service.serverless.xyz.crashcourse.com/endpoint: test-eb7d5189
    service.serverless.xyz.crashcourse.com/hostname: test-eb7d5189
  labels:
    app.kubernetes.io/component: serving
    app.kubernetes.io/part-of: service
    serverless.xyz.crashcourse.com/service-name: test
    service.serverless.xyz.crashcourse.com/endpoint: test-eb7d5189
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
        autoscaling.knative.dev/max-scale: "1"
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/min-scale: "1"
    spec:
      containers:
        - image: ghcr.io/knative/helloworld-go:latest
          name: hello
          ports:
            - containerPort: 8080
          env:
            - name: TARGET
              value: "World"

Same for the annotations/labels, is it linked to the operator

@hyde404 hyde404 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 6, 2025
@skonto
Copy link
Contributor

skonto commented Feb 7, 2025

Hi @hyde404 ,

The external-ip from ExternalName turned from test-eb7d5189.serverless-dev.xyz.crashcourse.com to knative-local-gateway.istio-system.svc.cluster.local.

The externalname should point to istio it is used for different purposes e.g. traffic splitting.

I haven't checked all the details yet but is that external name being exposed on the AWS lb directly somehow (due to your ingresses) or Istio is not picking up changes? Could you try a more standard approach as in Knative docs as a smoke test?

Probing of https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443 failed, IP: 100.64.174.122:443, ready: false, error: error roundtripping https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443/healthz: read tcp 100.64.162.193:36768->100.64.174.122:443: read: connection reset by peer (depth: 0)",

This the reason you see the loadbalancer not being ready. I am wondering why https is used, what is the istio mode you use mtls?

Note: Unfortunately I dont have an AWS cluster to test, so I am guessing.

@hyde404
Copy link
Author

hyde404 commented Feb 11, 2025

Hi @skonto,

Thanks for your reply !
The istio gateway mode I use is "simple", and the one set in the ingress contoller is apparently mTLS (controlPlaneAuthPolicy: MUTUAL_TLS).
I tried with a standard approach by installing knative/istio/net-istio using this piece of documentation and got the exact same result.

By removing

        proxy.istio.io/config: |
          {
            "gatewayTopology": {
              "proxyProtocol": {}
            }
          }

from podAnnotations, it works, but we loose the ability to keep client source IP which is not desirable.
A couple of combinations have been tested, based of this proxy config but we got no luck.

It seems like, a probe, maybe from net-istio has issues.
Moreover, we came accross this feature request which really looks alike what we're facing right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants