-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix certgen job race condition #5082
base: main
Are you sure you want to change the base?
fix certgen job race condition #5082
Conversation
Signed-off-by: Ryan Hristovski <[email protected]>
/retest |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5082 +/- ##
==========================================
- Coverage 66.85% 66.84% -0.01%
==========================================
Files 211 211
Lines 32916 32920 +4
==========================================
Hits 22007 22007
- Misses 9584 9587 +3
- Partials 1325 1326 +1 ☔ View full report in Codecov by Sentry. |
The hook-weight should not make any difference since RBAC resources are installed by helm before |
@shahar-h ArgoCD doesn't use |
According to argocd install order |
@shahar-h I was able to replicate my error with the following code, using a server-side apply... but its very inconsistent (it happens about 1/20 times, trying to understand why, and to see if the
Setting to draft while I confirm this. |
Signed-off-by: Ryan Hristovski <[email protected]>
Signed-off-by: Ryan Hristovski <[email protected]>
@shahar-h I can't understand why this is failing, my only theory is that the order is not respected when it comes to hooks... regardless of being able to replicate the error, the more I thought about it, I don't think having a pre-install hook for RBAC makes sense. The object persists throughout and they are just recreated each time a sync is triggered with nothing new added, I don't see any benefit in them being recreated each time theres a sync. I updated this PR to remove the pre-install annotations on RBAC, let me know what you think. Edit: will fix tests if I get positive consensus |
Envoy-gateway certgen job frequently goes into a crash loop when running a sync using ArgoCD in our environment with the following error:
Though, sometimes this job succeeds with no problems...
I believe this inconsistency is due to a race condition between the RBAC resources and the certgen job (both installed with
pre-install
). Since the RBAC resources do not exist continuously, and are only applied on installation and then removed, I believe the job Is trying to start before the permissions are available. After setting thehook-weight
annotation in our environment we no longer see this problem.Another solution could be to have the RBAC resources exist continuously without the
pre-hook
but I imagine this was a conscious security decision.