Monitoring with Prometheus
Prometheus collects data from metrics endpoints.
See Prometheus getting started
One easy way is to run Prometheus in Docker
Query Data
Once you connected targets you can query them as well. The following examples use the server load as exported by the node exporter (see below)
Just get for each configured server one value (at one point in time). This is called in Prometheus a "Instant vector"
You can also get the value as it was some time ago
Filter on servers that have "rabbit" in the name (~ activates regexp).
Can also have multiple conditions (contains rabbit but not rabbit 02)
Aggregate values
avg(node_load1{instance=~".*rabbit.*"})
MAX(node_load1{instance=~".*rabbit.*"})
Highest (lowest) 5 values per moment in time
bottomk(4, node_load1)
node_load1-(node_load1 offset 60m)
Instead of getting only one value per target per moment in time, you can also get all the values within a time range. For example all values in the last 5 minutes. This is called in Prometheus a "Range vector".
You can also get 5 minutes from 4 hours ago
Compare values to the average values
Targets
Each target is a metrics endpoint that Prometheus can collect data from. Once configured in Prometheus they should be listed in your Prometheus server http://my-prometheus-server.example.com:9090/targets
Random
cd client_golang/examples/random
go get -d
go build
# Start 3 example targets in separate terminals:
./random -listen-address=:8080
./random -listen-address=:8081
./random -listen-address=:8082
Node exporter
The Prometheus node exporter exports from the current computer some general stats, like for example CPU load.
Test if with
curl https://localhost:9100/metrics
MongoDB
Metrics endpoint is per default at port 9104
Aerospike
HAProxy
Recent versions of haproxy come with a metrics endoint. Otherwise there is also a Debian package for an exporter
Or you run it in Docker
Docker
Docker can expose its stats to Prometheus
Alerting
prometheus.yml
scrape_interval: 30s
rule_files:
- 'alerts/*.yml'
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "my-alerting-server.example.com:9093"
Example for an alert file
- name: alerting_rules_node_exporter
rules:
- alert: OutOfMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
for: 5m
labels:
severity: warning
annotations:
summary: "Out of memory (instance {{ $labels.instance }})"
description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: HAProxyExporterBadHttpCodes
expr: sum(rate(haproxy_backend_http_responses_total{code!="2xx"}[5m])) BY (backend) * 100 / ( sum(rate(haproxy_backend_http_responses_total[5m])) BY (backend) + 1) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "HAProxy too many bad http code (instance {{ $labels.instance }})"
description: "HAProxy bad http code rate is more than 5%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
Dashboards with Grafana
docker run -d -p 3000:3000 --name=grafana -v grafana-storage:/var/lib/grafana grafana/grafana