Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hugepages 2Mi and 1Gi fields to ResourceDescription and pass them to the statefulset #2311

Merged
merged 13 commits into from
Jan 4, 2024

Conversation

silenium-dev
Copy link
Contributor

@silenium-dev silenium-dev commented May 4, 2023

This is a simple approach to these issues and just allows hugepages requests and limits to be specified in the CR for postgres container and eventual sidecars.
At the moment, it is not possible to use the operator on clusters with enabled hugepages support, because postgres detects and tries to use hugepages, but fails to allocate as the limit on containers defaults to zero. This PR allows specifying a little amount of hugepages to prevent postgres from crashing.

Closes #1788 and closes #1549

@willstott101
Copy link

Is there anything blocking this from being accepted? This issue is currently preventing us from using postgres-operator. Would it be helpful to have a more minimal diff?

@silenium-dev
Copy link
Contributor Author

silenium-dev commented Jun 26, 2023

Would it be helpful to have a more minimal diff?

@willstott101 I've cleaned up the diff as suggested, I wasn't aware that GoLand had formatted the entire docs file.

@mmmcc666
Copy link

I've just built this locally and it works as hoped, allowing hugepage allocations to be passed to the db pods which then don't crash when running on k8s nodes with hugepages enabled 👍

@mmmcc666
Copy link

I've just built this locally and it works as hoped, allowing hugepage allocations to be passed to the db pods which then don't crash when running on k8s nodes with hugepages enabled 👍

However, it would be even more useful to be able to completely disable hugepages in the postgres pods by being able to pass huge_pages=off in as described here https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY

@pieveee
Copy link

pieveee commented Jul 7, 2023

Any updates on this?

@Jan-M
Copy link
Member

Jan-M commented Oct 14, 2023

@FxKu myself also curious about this.

@@ -688,6 +688,25 @@ manifest the operator will raise the limits to the configured minimum values.
If no resources are defined in the manifest they will be obtained from the
configured [default requests](reference/operator_parameters.md#kubernetes-resource-requests).

### HugePages support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this comes with K8s 1.28

Can we also link https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Jan-M
Copy link
Member

Jan-M commented Oct 14, 2023

Turning it off can work already via Patroni/Postgres config

@@ -2979,6 +2978,98 @@ func TestGenerateResourceRequirements(t *testing.T) {
ResourceLimits: acidv1.ResourceDescription{CPU: "1", Memory: "2Gi"},
},
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we amend the other test to test for no huge pages set? Given its K8s 1.28 feature lets make sure its not set if not enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test to cover this

@Jan-M
Copy link
Member

Jan-M commented Oct 14, 2023

Tiny comments from my side

@USA-RedDragon
Copy link

USA-RedDragon commented Nov 7, 2023

Turning it off can work already via Patroni/Postgres config

This is true for a cluster that is already created, but when creating new clusters the initial bootstrap dies with a Bus Error regardless of the acid.zalan.do/v1postgresql's spec.postgresql.parameters.huge_pages being set to off. This has been a preventing me from using this in a cluster that also uses OpenEBS's Mayastor, which requires huge pages to be enabled on the host, without manually editing the statefulset.

I think this might be the behavior #2311 (comment) was referring to

It's also very possible there's an option I'm missing in the CRD that could accomplish this, I just have yet to find it.

It seems like @silenium-dev has gotten the review changes completed. Any chance you could give this a re-review @Jan-M?

@sigi-tw
Copy link

sigi-tw commented Nov 7, 2023

We faced a huge issue with this too: I verified that huge pages was configured and i only updated the value from 1024 to 1536 to have enough for mayastore. After that the databases destroyed themselves.

Due to me being sure that huge pages was available before, took ages to figure this out.

postgresql segfaulted btw (will create a ticket at postgres for this segfault).

@sj-porter-knime
Copy link

Any chance this PR could be approved/merged soon? 😸

@FxKu
Copy link
Member

FxKu commented Nov 16, 2023

@silenium-dev looks good. Lets get this ready to get merged. Only minor stuff is missing now. Can you

  1. add documentation in the reference,
  2. reflect it in the Postgres CRD in the chart,
  3. extend our complete manifest example?

Thanks in advance!!

@silenium-dev
Copy link
Contributor Author

@FxKu

  1. add documentation in the reference,

  2. reflect it in the Postgres CRD in the chart,

  3. extend our complete manifest example?

All done, please review the changes.

@silenium-dev
Copy link
Contributor Author

The e2e tests use the full cluster manifest, but the test environment doesn't support hugepages. I could set the hugepages requests and limits in the manifest to 0, but this kind of defeats the purpose of being a full example. I don't have much experience with GitHub Actions, would it be possible to enable support for hugepages in the test environment?

@XLordalX
Copy link

XLordalX commented Dec 6, 2023

Any updates on this? It's currently impossible to run this operator alongside mayastor without manually adding the resources.

@FxKu
Copy link
Member

FxKu commented Jan 4, 2024

👍

1 similar comment
@idanovinda
Copy link
Member

👍

@FxKu FxKu merged commit 9581ba9 into zalando:master Jan 4, 2024
5 checks passed
@FxKu FxKu added this to the 1.11.0 milestone Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done