-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preempt ordering issue #3962
Comments
I will check this later |
Thank you for the scene you mentioned, which is very detailed. |
@lowang-bh @Monokaix @JesseStutler |
I didn't get why your PR fixes the problem you're talking about, in the problem you're describing it doesn't make a difference if the Job is sorted by UUID or by Name, if MinAvailable is 1 and the gang plugin is turned on, both job-1 and job-3 can't be preempted below 1 replica |
preempt-dev-job3-low-nginx-1 and preempt-dev-job1-low-priority-nginx-1 both have and need 4 GPUs broken down in 2 separate nodes. preempt-dev-job3-low-nginx-1 -> node 1 (2 GPUs), node 2 (2 GPUs)
I believe we have MinAvailable 0 (or default). What we found is that when the victim list on a node (tasks) was sorted by UUID rather than by name, the sorting order wasn't consistent across all nodes. Makes sense? |
Description
We have 2 nodes with 4 GPU each and we have the following jobs deployed
preempt-dev-job3-low-nginx-1
which is part of a gang job.preempt-dev-job3-low-nginx-1
andpreempt-dev-job1-low-priority-nginx-1
always show as second on the victims list (Those are the ones picked as victims)Steps to reproduce the issue
Describe the results you received and expected
Expect to always to have consistent ordering for preempting victims.
What version of Volcano are you using?
1.10
Any other relevant information
No response
The text was updated successfully, but these errors were encountered: