Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator attempting to apply namepace scoped GrafanaNotificationPolicy to all Grafanas #1738

Open
marpears opened this issue Oct 30, 2024 · 10 comments
Labels
bug Something isn't working triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@marpears
Copy link

Describe the bug
When multiple Grafana instances exist in the same cluster and a GrafanaNotificationPolicy is created, the operator attempts to apply it to all Grafana instances

Version
5.14.0

To Reproduce
Steps to reproduce the behavior:

  1. Create Grafana instances in different namespaces in the same cluster
  2. Create a GrafanaNotificationPolicy
  3. Run oc describe GrafanaNotificationPolicy <> and observe the attempts to apply the GrafanaNotificationPolicy to other Grafana instances. Note in the example below, there are 8 Grafanas in different namespaces. The operator is attempting to apply the
    GrafanaNotificationPolicy to all 8 of them and is only succeeding in one.
Status:
  Conditions:
    Last Transition Time:  2024-10-30T16:00:19Z
    Message:               Notification Policy failed to be applied for 7 out of 8 instances. Errors:
- hcp-monitoring/grafana: applying notification policy: [PUT /v1/provisioning/policies][400] putPolicyTreeBadRequest {"message":"invalid object specification: receiver 'test' does not exist"}
- il-core-monitoring/grafana: applying notification policy: [PUT /v1/provisioning/policies][400] putPolicyTreeBadRequest {"message":"invalid object specification: receiver 'test' does not exist"}
- il-vest-monitoring/grafana: applying notification policy: [PUT /v1/provisioning/policies][400] putPolicyTreeBadRequest {"message":"invalid object specification: receiver 'test' does not exist"}
- testapp-01-monitoring/grafana: applying notification policy: [PUT /v1/provisioning/policies][400] putPolicyTreeBadRequest {"message":"invalid object specification: receiver 'test' does not exist"}
- mytestapp-02-monitoring/grafana: applying notification policy: [PUT /v1/provisioning/policies][400] putPolicyTreeBadRequest {"message":"invalid object specification: receiver 'test' does not exist"}
- testapp06-monitoring/grafana: applying notification policy: [PUT /v1/provisioning/policies][400] putPolicyTreeBadRequest {"message":"invalid object specification: receiver 'test' does not exist"}
- testval0-monitoring/grafana: applying notification policy: [PUT /v1/provisioning/policies][400] putPolicyTreeBadRequest {"message":"invalid object specification: receiver 'test' does not exist"}
    Observed Generation:  2
    Reason:               ApplyFailed
    Status:               False
    Type:                 NotificationPolicySynchronized
Events:                   <none>

Expected behavior
The GrafanaNotificationPolicy should only be applied to the Grafana in the same namespace

Suspect component/Location where the bug might be occurring
unknown

Runtime (please complete the following information):

  • OS:Linux
  • Grafana Operator Version 5.14.0
  • Environment: Openshift 4.16.17
  • Deployment type: Openshift OLM

Additional information
Example GrafanaNotificationPolicy

---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaNotificationPolicy
metadata:
  name: test
  namespace: testapp-monitoring
spec:
  instanceSelector:
    matchLabels:
      dashboards: "grafana"
  route:
    receiver: grafana-default-email
    routes:
      - receiver: test
        object_matchers:
          - - type
            - =
            - test
@marpears marpears added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 30, 2024
@theSuess
Copy link
Member

theSuess commented Nov 4, 2024

Thanks for the report! I'll try to reproduce the issue this week and provide a fix!

@theSuess theSuess added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 4, 2024
@marpears
Copy link
Author

marpears commented Nov 7, 2024

It seems this issue is not limited to GrafanaNotificationPolicies. For example, I have 2 Grafana instances deployed by the operator, each in different namespaces. Both Grafana instances have the same label.

In one of those namespaces I deploy a GrafanaDashboard which matches the label of the Grafana in the instanceSelector. Although this label matches both Grafana instances, I would only expect the dashboard to be applied to the Grafana which exists in the same namespace.

The .status field of the GrafanaDashboard says the "Dashboard was successfully applied to 2 instances" but it should only have applied to the Grafana that exists in the same namespace.

status:
  conditions:
    - lastTransitionTime: '2024-11-07T17:19:37Z'
      message: Dashboard was successfully applied to 2 instances
      observedGeneration: 1
      reason: ApplySuccessful
      status: 'True'
      type: DashboardSynchronized
  hash: fa829f4b4f5657900961b0f5952f40319de9673388858c48e69af293f9853ca6
  lastResync: '2024-11-07T17:24:38Z'
  uid: f90b4fa2a99d1861baa28c7fb96abb2ecb61be88

The Grafana operator pod log confirms it has identified 2 matching instances

2024-11-07T17:29:39Z    INFO    GrafanaDashboardReconciler      found matching Grafana instances for dashboard  {"controller": "grafanadashboard", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDashboard", "GrafanaDashboard": {"name":"my-dashboard","namespace":"test-monitoring"}, "namespace": "test-monitoring", "name": "my-dashboard", "reconcileID": "5209bbd8-5fd2-48dd-adf0-6bccff1ab60a", "count": 2

If I check the Grafana instance that exists in the other namespace, the dashboard has not been created.

@pb82
Copy link
Collaborator

pb82 commented Nov 18, 2024

@marpears did you set allowCrossNamespaceImport in your dashboards? If that is not set, the operator should not import dashboards if they are not in the same namespace as the Grafana instances: https://github.com/grafana/grafana-operator/blob/master/controllers/dashboard_controller.go#L242

@pb82
Copy link
Collaborator

pb82 commented Nov 18, 2024

There is a potential bug here: https://github.com/grafana/grafana-operator/blob/master/controllers/dashboard_controller.go#L246
We're checking the namespace before reassigning the iterator variable. Pre Go 1.22 this would mean we're checking the wrong instances.

@marpears what Operator version were you using when you saw that issue?

Edit: we updated to Go 1.22 starting in version 5.9.1: 0988f65

@marpears
Copy link
Author

Thanks @pb82 we're using Grafana Operator version 5.14. We don't have allowCrossNamespaceImport set.

I see the same problem also affecting GrafanaNotificationPolicy and GrafanaFolder CRs.

Copy link

This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label

@github-actions github-actions bot added the stale label Dec 19, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 27, 2024
@marpears
Copy link
Author

marpears commented Jan 2, 2025

I missed the stale warning due to Christmas holidays - but this is still an issue. Hi @theSuess can this be re-opened please?

@theSuess theSuess reopened this Jan 7, 2025
@theSuess
Copy link
Member

theSuess commented Jan 7, 2025

Thanks for the ping @marpears!

As a status update: we're currently refactoring the instance mapping code (beginning with #1770)

@github-actions github-actions bot removed the stale label Jan 8, 2025
@mpham-brc
Copy link

mpham-brc commented Jan 8, 2025

I am having a simialr problem but with 1 Grafana and 1 namespace. It worked once and after a redeployment with no changes it can't seem to find the instance. The Contact Points and Dashboard CRs all applied correctly. Everything pointing to the same Instance Label.

message: Notification Policy was successfully applied to 0 instances │
│ observedGeneration: 3 │
│ reason: ApplySuccessful │
│ status: "True" │
│ type: NotificationPolicySynchronized

@theSuess
Copy link
Member

@mpham-brc have you tried this with the latest version of the operator?

This issue should be fixed as of 5.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants