Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: convert prometheus metrics to use gauge type #640

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

titilambert
Copy link
Contributor

This PR changes the metrics type to reflect the actual state of the tasks
This is linked to #420 (comment)
here is an example of the new metrics output

# HELP backrest_backup_bytes_added The total number of bytes added during a backup
# TYPE backrest_backup_bytes_added gauge
backrest_backup_bytes_added{plan_id="testplan",repo_id="test"} 2309
# HELP backrest_backup_bytes_processed The total number of bytes processed during a backup
# TYPE backrest_backup_bytes_processed gauge
backrest_backup_bytes_processed{plan_id="testplan",repo_id="test"} 11
# HELP backrest_backup_file_warnings The total number of file warnings during a backup
# TYPE backrest_backup_file_warnings gauge
backrest_backup_file_warnings{plan_id="testplan",repo_id="test"} 0
# HELP backrest_tasks_duration_secs The duration of a task in seconds
# TYPE backrest_tasks_duration_secs gauge
backrest_tasks_duration_secs{plan_id="_unassociated_",repo_id="_unassociated_",task_type="collect_garbage"} 0.002664103
backrest_tasks_duration_secs{plan_id="testplan",repo_id="test",task_type="backup"} 1.632216997
backrest_tasks_duration_secs{plan_id="testplan",repo_id="test",task_type="hook"} 0.002073014
# HELP backrest_tasks_run_total The total number of tasks run
# TYPE backrest_tasks_run_total counter
backrest_tasks_run_total{plan_id="_unassociated_",repo_id="_unassociated_",status="success",task_type="collect_garbage"} 1
backrest_tasks_run_total{plan_id="testplan",repo_id="test",status="success",task_type="backup"} 1
backrest_tasks_run_total{plan_id="testplan",repo_id="test",status="success",task_type="hook"} 1

@titilambert titilambert marked this pull request as draft January 17, 2025 03:33
@garethgeorge garethgeorge changed the title Improve metrics fix: convert prometheus metrics to use gauge type Jan 18, 2025
@garethgeorge
Copy link
Owner

garethgeorge commented Jan 20, 2025

Hey -- thanks for PRing this change. Happy to improve the state of prometheus metrics.

One question I have about how this performs is -- gauge metrics give point in time signals of the value of a property as I understand it. But don't signal things like "how many instances of the event have occurred" etc.

Meaning that it'll be possible to construct a dashboard that shows info e.g. for the last run of a task, but I'm not sure how this would work for dashboards tracking stats over time e.g. average bytes added, etc. I'm curious how you see this being used / whether this is easier to consume for people building dashboards on prometheus?

@titilambert
Copy link
Contributor Author

Hey -- thanks for PRing this change. Happy to improve the state of prometheus metrics.

One question I have about how this performs is -- gauge metrics give point in time signals of the value of a property as I understand it. But don't signal things like "how many instances of the event have occurred" etc.

Meaning that it'll be possible to construct a dashboard that shows info e.g. for the last run of a task, but I'm not sure how this would work for dashboards tracking stats over time e.g. average bytes added, etc. I'm curious how you see this being used / whether this is easier to consume for people building dashboards on prometheus?

You're right you can not do this can of graph with Prometheus only. In my personal case, I'm using InfluxDB as database. And all the stats that Prometheus can expose, a database (like InfluxDB) can do it easily.
Prometheus is NOT a database. It is recommended to NOT keep long term data (compared to an actual database).

That said, if we use Prometheus only and with stats over time (like today), you're resetting your stats every time backrest restart...

I'm working as devOps for a decade now, and every time I have to use Prometheus, I'm disappointed seeing that it's not a turnkey solution. It always needs more tools to be effective.

So, Maybe we can keep the ones that were there and just adding the gauge ones to have all of them available ?

What do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants