Perfdata writers: disconnect handlers from signals in Pause() #9321

Al2Klimov · 2022-04-06T11:12:05Z

as they would be re-connected in Resume() (HA).

Before they were still connected during pause and connected X+1 times
after X split-brains (the same data was written X+1 times).

as they would be re-connected in Resume() (HA). Before they were still connected during pause and connected X+1 times after X split-brains (the same data was written X+1 times).

Al2Klimov · 2022-04-06T11:17:29Z

Didn't backport to 2.12 for the same reason as you didn’t backport #9207.

julianbrost

after X split-brains

Not necessarily, just restarting one node is enough.

But note that this change only has an effect with enable_ha explicitly set to true (default is false), otherwise Resume() shouldn't be called multiple times anyways.

I did the following test:

Set enable_ha = true on both my masters (in my setup, this cause only master-2 to write to InfluxDB)
Stop master-2 (and wait until master-1 takes over writing to InfluxDB)
Start master-2 (and wait for it to take over again)
Repeat the previous steps a few times (to call Resume() a few times on master-1)
Stop master-2 one last time so that master-1 takes over again
Observe the InfluxDB queue item rate

Interpretation of the graph: Both lines show the item rate from Influxdb2Writer (influxdb2writer_influxdb-v2_work_queue_item_rate from the icinga check command, blue is master-1, purple is master-2). Events:

enable_ha = false, both masters are writing about 8 items/sec.
enable_ha = true is set. Now only one master writes about 8 items/sec.
Repeated starting and stopping of master-2 with current master branch: after this, when only master-1 is running, the rate goes up to around 45 items/sec.
(In addition to the steps mentioned above) Starting master-2 again so that it takes over again, its rate again is around 8 items/sec as on this instance, Resume() wasn't called multiple times.
Repeating the test with this PR, item rate stays at around 8 items/sec.

There are two instants where master-2 show a rather high item rate. But keep in mind that this node was restarted repeatedly and while it was gone, replay log accumulated that is handled when it's started, so a higher rate is expected for a short time.

Perfdata writers: disconnect handlers from signals in Pause()

Perfdata writers: disconnect handlers from signals in Pause()

56933b8

as they would be re-connected in Resume() (HA). Before they were still connected during pause and connected X+1 times after X split-brains (the same data was written X+1 times).

Al2Klimov added bug Something isn't working area/graphite Metrics to Graphite area/opentsdb Metrics to OpenTSDB area/influxdb Metrics to InfluxDB area/graylog Events to Graylog area/elastic Events to Elasticsearch labels Apr 6, 2022

Al2Klimov added this to the 2.14.0 milestone Apr 6, 2022

Al2Klimov requested a review from julianbrost April 6, 2022 11:12

cla-bot bot added the cla/signed label Apr 6, 2022

Al2Klimov mentioned this pull request Apr 6, 2022

Perfdata writers: disconnect handlers from signals in Pause() #9322

Merged

julianbrost approved these changes Apr 6, 2022

View reviewed changes

Al2Klimov enabled auto-merge April 6, 2022 14:57

Al2Klimov mentioned this pull request Apr 7, 2022

Perfdata writers: disconnect handlers from signals in Pause() #9329

Merged

Al2Klimov merged commit 39d642a into master Apr 7, 2022

icinga-probot bot deleted the perfdata-resume-signal branch April 7, 2022 13:52

yhabteab pushed a commit that referenced this pull request Sep 5, 2022

Merge pull request #9321 from Icinga/perfdata-resume-signal

e1cdf53

Perfdata writers: disconnect handlers from signals in Pause()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perfdata writers: disconnect handlers from signals in Pause() #9321

Perfdata writers: disconnect handlers from signals in Pause() #9321

Al2Klimov commented Apr 6, 2022

Al2Klimov commented Apr 6, 2022

julianbrost left a comment •

edited

Loading

Perfdata writers: disconnect handlers from signals in Pause() #9321

Perfdata writers: disconnect handlers from signals in Pause() #9321

Conversation

Al2Klimov commented Apr 6, 2022

Al2Klimov commented Apr 6, 2022

julianbrost left a comment • edited Loading

Choose a reason for hiding this comment

julianbrost left a comment •

edited

Loading