The ungleich monitoring infrastructure » History » Revision 2

« Previous | Revision 2/35 (diff) | Next »
Dominique Roux, 04/20/2019 09:12 PM

The ungleich monitoring infrastructure¶

Table of contents
The ungleich monitoring infrastructure

Introduction¶

We use the following technology / products for the monitoring:

consul (service discovery)
prometheus (exporting, gathering, alerting)
Grafana (presenting)

Consul¶

We use a consul cluster for each datacenter (e.g. place5 and place6).
The servers are located on the physical machines (red{1..3} resp. black{1..3}) and the agents are running on all other monitored machines (such as servers and VMs)

consul is configured to publish the service its host is providing (e.g. the exporters)

There is a inter-datacenter communication (wan gossip) [https://www.consul.io/docs/guides/datacenters.html]

Prometheus¶

Prometheus is responsible to get all data out (exporters) of the monitored host and store them. Also to send out alerts if needed (alertmanager)

Exporters¶

Node (host specific metrics (e.g. CPU-, RAM-, Disk-usage..))
Ceph (Ceph specific metrics (e.g. pool usage, osds ..))
blackbox (Metrics about online state of http/https services)

The node exporter is located on all monitored hosts
Ceph exporter is porvided by ceph itself and is located on the ceph manager.
The blackbox exporter is located on the monitoring control machine itself.

Alerts¶

We configured the following alerts:

ceph osds down
ceph health state is not OK
ceph quorum not OK
ceph pool disk usage too high
ceph disk usage too high
instance down
disk usage too high
Monitored website down

Grafana¶

Files (0)

Updated by Dominique Roux over 6 years ago · 2 revisions

Project

General

Profile

Open Infrastructure

Wiki