Monitoring with Prometheus

Prometheus collects data from metrics endpoints.

See Prometheus getting started

One easy way is to run Prometheus in Docker

docker run -d -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

Query Data

Once you connected targets you can query them as well. The following examples use the server load as exported by the node exporter (see below)

Just get for each configured server one value (at one point in time). This is called in Prometheus a "Instant vector"

node_load1

You can also get the value as it was some time ago

node_load1 offset 1h

Filter on servers that have "rabbit" in the name (~ activates regexp).

node_load1{instance=~".*rabbit.*"}

Can also have multiple conditions (contains rabbit but not rabbit 02)

node_load1{instance=~".*rabbit.*", instance!~".*rabbit.*-02.*"}

Aggregate values

COUNT(node_load1)
avg(node_load1{instance=~".*rabbit.*"})
MAX(node_load1{instance=~".*rabbit.*"})

Highest (lowest) 5 values per moment in time

topk(4, node_load1)
bottomk(4, node_load1)
Do math WITH VALUES
node_load1-(node_load1 offset 60m)

Instead of getting only one value per target per moment in time, you can also get all the values within a time range. For example all values in the last 5 minutes. This is called in Prometheus a "Range vector".

node_load1[5m] offset 4h

You can also get 5 minutes from 4 hours ago

node_load1[5m] offset 4h

Compare values to the average values

node_load5{instance=~".*rabbit.*"} - ignoring(job, instance) group_left avg WITHOUT (job, instance)(node_load5{instance=~".*rabbit.*"})

Targets

Each target is a metrics endpoint that Prometheus can collect data from. Once configured in Prometheus they should be listed in your Prometheus server http://my-prometheus-server.example.com:9090/targets

Random

git clone https://github.com/prometheus/client_golang.git
cd client_golang/examples/random
go get -d
go build

# Start 3 example targets in separate terminals:
./random -listen-address=:8080
./random -listen-address=:8081
./random -listen-address=:8082

Node exporter

The Prometheus node exporter exports from the current computer some general stats, like for example CPU load.

apt-get install prometheus-node-exporter

Test if with

curl http://localhost:9100/metrics
curl https://localhost:9100/metrics

MongoDB

docker run -p 9104:9104 --name mongo_exporter -d eses/mongodb_exporter -mongodb.uri mongodb://my-mongo-server.example.com:27017

Metrics endpoint is per default at port 9104

Aerospike

docker run -p 9145:9145 -d --name aerospike_prometheus_exporter ysde/docker-aerospike-exporter --node my-aerospike-server.example.com:3000

HAProxy

Recent versions of haproxy come with a metrics endoint. Otherwise there is also a Debian package for an exporter

apt-get install prometheus-haproxy-exporter

Or you run it in Docker

docker run -p 9101:9101 -d --name prometheus-haproxy-exporter quay.io/prometheus/haproxy-exporter --haproxy.scrape-uri="http://user:secret@myserver.example.com:80/stats;csv"

Docker

Docker can expose its stats to Prometheus

Alerting

prometheus.yml

global:
  scrape_interval:     30s

rule_files:
  - 'alerts/*.yml'

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "my-alerting-server.example.com:9093"

Example for an alert file

groups:
  - name: alerting_rules_node_exporter
    rules:
      - alert: OutOfMemory
        expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Out of memory (instance {{ $labels.instance }})"
          description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
- alert: HAProxyExporterBadHttpCodes
        expr:  sum(rate(haproxy_backend_http_responses_total{code!="2xx"}[5m])) BY (backend) * 100 / ( sum(rate(haproxy_backend_http_responses_total[5m])) BY (backend) + 1) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "HAProxy too many bad http code (instance {{ $labels.instance }})"
          description: "HAProxy bad http code rate is more than 5%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

Dashboards with Grafana

docker volume create grafana-storage
docker run -d -p 3000:3000 --name=grafana -v grafana-storage:/var/lib/grafana grafana/grafana