-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reflect pod failed creates on the chaperon status #227
base: master
Are you sure you want to change the base?
Conversation
a64ae22
to
d377bf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use an e2e test.
if podChaperon.Status.Phase == "" { | ||
return true | ||
} | ||
for _, condition := range podChaperon.Status.Conditions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. If the candidate pod creation failed, why would podChaperon.Status.Phase not be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So prior to this change, the following will happen:
- Attempt to create the pod.
pod, err = c.kubeclientset.CoreV1().Pods(podChaperon.Namespace).Create(ctx, newPod(podChaperon), metav1.CreateOptions{}) - Pod creation fails.
return nil, fmt.Errorf("cannot create pod for pod chaperon %v", err) - The status setting code on the chaperon never gets to run because of the early exit above so the chaperon status is never updated.
if needStatusUpdate {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change, only the condition is updated, so the phase should still be empty, right? So line 235 (return true) is never executed. Checking the condition here appears to be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops I misunderstood your earlier question, good catch! I forgot to also set the phase here. I think it would be reasonable to set the phase to Pending
in that failed pod create case since Failed
generally implies container termination.
7f2c1b7
to
93a6633
Compare
93a6633
to
2db32a6
Compare
|
This fixes a bug where pod chaperons in a target cluster can delay the scheduling loop if the pods fail to create and no status is set on the chaperon.
This changes the behavior to set
PodScheduled
condition with reasonPodFailedCreate
and checking for that in the proxy filter step. The pod creation gets requeued and retried. Upon success, the chaperon status will inherit the pod one as before.