Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: Topology Aware Scheduling when Negative scenarios for ClusterQueue configuration should mark TAS ClusterQueue as inactive if used in cohort #4033

Open
tenzen-y opened this issue Jan 22, 2025 · 2 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test.

Comments

@tenzen-y
Copy link
Member

What happened:

Failure on "TopologyAwareScheduling Suite: [It] Topology Aware Scheduling when Negative scenarios for ClusterQueue configuration should mark TAS ClusterQueue as inactive if used in cohort"

{Timed out after 5.000s.
Expected object to be comparable, diff:   []v1.Condition{
  	{
  		... // 2 ignored and 2 identical fields
  		Reason: "NotSupportedWithTopologyAwareScheduling",
  		Message: strings.Join({
  			"Can't admit new workloads: TAS is not supported for cohorts",
- 			`, there is no Topology "default" for TAS flavor "tas-flavor"`,
  			".",
  		}, ""),
  	},
  }
 failed [FAILED] Timed out after 5.000s.
Expected object to be comparable, diff:   []v1.Condition{
  	{
  		... // 2 ignored and 2 identical fields
  		Reason: "NotSupportedWithTopologyAwareScheduling",
  		Message: strings.Join({
  			"Can't admit new workloads: TAS is not supported for cohorts",
- 			`, there is no Topology "default" for TAS flavor "tas-flavor"`,
  			".",
  		}, ""),
  	},
  }
In [It] at: /home/prow/go/src/kubernetes-sigs/kueue/test/integration/tas/tas_test.go:112 @ 01/22/25 04:30:36.473
}

What you expected to happen:

Succeeded

How to reproduce it (as minimally and precisely as possible):

https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/periodic-kueue-test-integration-release-0-9/1881918244332244992

Image

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Kueue version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@tenzen-y tenzen-y added the kind/bug Categorizes issue or PR as related to a bug. label Jan 22, 2025
@tenzen-y
Copy link
Member Author

/kind flake

@k8s-ci-robot k8s-ci-robot added the kind/flake Categorizes issue or PR as related to a flaky test. label Jan 22, 2025
@mimowo
Copy link
Contributor

mimowo commented Jan 22, 2025

/assign
Let me check, it seems like a race condition with processing the topology ADDED event and creating the message.

I think we can just relax the assert and only check the reason "NotSupportedWithTopologyAwareScheduling". The message is tested at the unit tests level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test.
Projects
None yet
Development

No branches or pull requests

3 participants