-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: unify nodeclass status and termination controllers to prevent ra… #7597
fix: unify nodeclass status and termination controllers to prevent ra… #7597
Conversation
…ce conditions from leaking instance profiles As part of a recent investigation, I found that the nodeclass status and termination controllers race against each other at times, and depending on who wins the race, the instance profile can be leaked. Discussed options with the team and merging these two controllers is the most straightforward thing to do at the moment. All other solutions are not a 100% failsafe with the exception of adding new finalizers for instance profiles but they have their own issues due to being backwards incompatible. The primary downside of this change is that nodeclaim delete events can trigger a reconciliation now if the nodeclass is not deleted/deleting but looking through the current list of reconcilers, the impact of that should be minimal since all we're doing is making a few additional calls to EC2 for subnets and security groups and the additional calls for Instance Profile reconciliation only when nodeclaims are getting deleted.
✅ Deploy Preview for karpenter-docs-prod ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Pull Request Test Coverage Report for Build 12776979373Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Snapshot successfully published to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
…inations controllers are merged into a single nodeclass controller for 1.2.0+
6935fcb
to
c45d481
Compare
Snapshot successfully published to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Fixes #N/A
Description
As part of a recent investigation, I found that the nodeclass status and termination controllers race against each other at times, and depending on who wins the race, the instance profile can be leaked. Discussed options with the team and merging these two controllers is the most straightforward thing to do at the moment. All other solutions are not a 100% failsafe with the exception of adding new finalizers for instance profiles but they have their own issues due to being backwards incompatible.
The primary downside of this change is that nodeclaim delete events can trigger a reconciliation now if the nodeclass is not deleted/deleting but looking through the current list of reconcilers, the impact of that should be minimal since all we're doing is making a few additional calls to EC2 for subnets and security groups and the additional calls for Instance Profile reconciliation only when nodeclaims are getting deleted.
How was this change tested?
make presubmit
and E2E testing via CI as part of this PR.Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.