Monitor prometheus exporters "up" status
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	fgiunchedi
	Feb 19 2018, 11:23 AM

Description

We should monitor the "upness" (according to the up metric from Prometheus) of various prometheus exporters we have deployed now. The metric is exported automatically by Prometheus and set to 0 whenever Prometheus is unable to scrape metrics from the given exporter.

In addition to the up metrics, exporters often export the status of the underlying daemon as <daemon>_up and we should monitor that too.

Dashboard linked to the alerts: https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets

Details

Subject	Repo	Branch	Lines +/-
varnish: Runbook and dashboard for down exporter	operations/alerts	master	+4 -4
prometheus: alert on exporter's 'up' metrics	operations/puppet	production	+23 -0
prometheus: alert on low job availability	operations/puppet	production	+12 -0

Customize query in gerrit

Event Timeline

fgiunchedi created this task.Feb 19 2018, 11:23 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 19 2018, 11:23 AM

fgiunchedi added a project: User-fgiunchedi.Feb 19 2018, 11:23 AM

fgiunchedi moved this task from Backlog to Up next on the User-fgiunchedi board.Mar 28 2018, 3:49 PM

fgiunchedi moved this task from Up next to Backlog on the User-fgiunchedi board.Oct 9 2019, 11:31 PM

fgiunchedi moved this task from Backlog to Up next on the User-fgiunchedi board.Nov 22 2019, 12:13 PM

Change 552521 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: alert on low job availability

https://gerrit.wikimedia.org/r/552521

gerritbot added a project: Patch-For-Review.Nov 22 2019, 3:00 PM

Change 552521 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: alert on low job availability

https://gerrit.wikimedia.org/r/552521

fgiunchedi moved this task from Up next to Doing on the User-fgiunchedi board.Nov 26 2019, 3:42 PM

Maintenance_bot removed a project: Patch-For-Review.Nov 26 2019, 4:10 PM

Change 553335 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: alert on exporter's 'up' metrics

https://gerrit.wikimedia.org/r/553335

gerritbot added a project: Patch-For-Review.Nov 27 2019, 12:34 PM

fgiunchedi updated the task description. (Show Details)Nov 27 2019, 5:39 PM

Change 553335 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: alert on exporter's 'up' metrics

https://gerrit.wikimedia.org/r/553335