Activator health checks #15575

thorweijie · 2024-10-16T08:46:39Z

Ask your question here:

We have a kubernetes cluster with many inference services. After all the inference services were restarted, we noticed the istio-proxy container in activator pods were having high cpu usage and health checks were failing with response code 0, so we set target burst capacity to 0 to bypass the activator and fix the issue. We noticed that despite being skipped, the activator pods were still trying to perform health checks with response code 0 until they were restarted. We would like to know if the health checks for activator are cached, and whether the frequency of the health checks can be configured?

skonto · 2024-10-23T14:50:49Z

Hi @thorweijie!

After all the inference services were restarted, we noticed the istio-proxy container in activator pods were having high cpu usage and health checks were failing with response code 0

What healthchecks were failing, the activator ones?

We noticed that despite being skipped, the activator pods were still trying to perform health checks with response code 0 until they were restarted. We would like to know if the health checks for activator are cached, and whether the frequency of the health checks can be configured?

The probing mechanism is started when endpoints are created/updated with a default frequency of 200ms.
If probing finished successfully you should see this msg assuming you enable activator debug logging:

{"severity":"DEBUG","timestamp":"2024-10-23T14:20:52.082125337Z","logger":"activator","caller":"net/revision_backends.go:348","message":"Done probing, got 1 healthy pods","commit":"0abee66","knative.dev/controller":"activator","knative.dev/pod":"activator-8675c9944c-mdfj9","knative.dev/key":"default/autoscale-go-00001"}

Once all pods are ready (and stay that way) probing should stop. The idea is that activator is in standby mode to handle traffic and so each activator instance needs to know ready targets so it can route traffic to them if needed.
Afaik there is no caching. Maybe @ReToCode, @dprotaso have more to say here.

github-actions · 2025-01-22T01:28:29Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

thorweijie added the kind/question Further information is requested label Oct 16, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activator health checks #15575

Activator health checks #15575

thorweijie commented Oct 16, 2024

skonto commented Oct 23, 2024

github-actions bot commented Jan 22, 2025

Activator health checks #15575

Activator health checks #15575

Comments

thorweijie commented Oct 16, 2024

Ask your question here:

skonto commented Oct 23, 2024

github-actions bot commented Jan 22, 2025