Ruan Bekker's Blog

From a Curious mind to Posts on Github

Install Blackbox Exporter to Monitor Websites With Prometheus

prometheus

Blackbox Exporter by Prometheus allows probing over endpoints such as http, https, icmp, tcp and dns.

What will we be doing

In this tutorial we will install the blackbox exporter on linux. Im assuming that you have already set up prometheus.

Install the Blackbox Exporter

First create the blackbox exporter user:

1
$ useradd --no-create-home --shell /bin/false blackbox_exporter

Download blackbox exporter and extract:

1
2
$ wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.14.0/blackbox_exporter-0.14.0.linux-amd64.tar.gz
$ tar -xvf blackbox_exporter-0.14.0.linux-amd64.tar.gz

Move the binaries in place and change the ownership:

1
2
$ cp blackbox_exporter-0.14.0.linux-amd64/blackbox_exporter /usr/local/bin/blackbox_exporter
$ chown blackbox_exporter:blackbox_exporter /usr/local/bin/blackbox_exporter

Remove the downloaded archive:

1
$ rm -rf blackbox_exporter-0.14.0.linux-amd64*

Create the blackbox directory and create the config:

1
2
$ mkdir /etc/blackbox_exporter
$ vim /etc/blackbox_exporter/blackbox.yml

Populate this config:

1
2
3
4
5
6
7
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_status_codes: []
      method: GET

Update the permissions of the config so that the user has ownership:

1
$ chown blackbox_exporter:blackbox_exporter /etc/blackbox_exporter/blackbox.yml

Create the systemd unit file:

1
$ vim /etc/systemd/system/blackbox_exporter.service

Populate the systemd unit file configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=blackbox_exporter
Group=blackbox_exporter
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml

[Install]
WantedBy=multi-user.target

Reload the systemd daemon and restart the service:

1
2
$ systemctl daemon-reload
$ systemctl start blackbox_exporter

The service should be started, verify:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ systemctl status blackbox_exporter
  blackbox_exporter.service - Blackbox Exporter
   Loaded: loaded (/etc/systemd/system/blackbox_exporter.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-05-08 00:02:40 UTC; 5s ago
 Main PID: 10084 (blackbox_export)
    Tasks: 6 (limit: 4704)
   CGroup: /system.slice/blackbox_exporter.service
           └─10084 /usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml

May 08 00:02:40 ip-172-31-41-126 systemd[1]: Started Blackbox Exporter.
May 08 00:02:40 ip-172-31-41-126 blackbox_exporter[10084]: level=info ts=2019-05-08T00:02:40.5229204Z caller=main.go:213 msg="Starting blackbox_exporter" version="(version=0.14.0, branch=HEAD, revision=bb
May 08 00:02:40 ip-172-31-41-126 blackbox_exporter[10084]: level=info ts=2019-05-08T00:02:40.52553523Z caller=main.go:226 msg="Loaded config file"
May 08 00:02:40 ip-172-31-41-126 blackbox_exporter[10084]: level=info ts=2019-05-08T00:02:40.525695324Z caller=main.go:330 msg="Listening on address" address=:9115

Enable the service on boot:

1
$ systemctl enable blackbox_exporter

Configure Prometheus

Next, we need to provide context to prometheus on what to monitor. We will inform prometheus to monitor a web endpoint on port 8080 using the blackbox exporter (we will create a python simplehttpserver to run on port 8080).

Edit the prometheus config /etc/prometheus/prometheus.yml and append the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - http://localhost:8080
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Open a new terminal, create a index.html:

1
$ echo "ok" > index.html

Then start a SimpleHTTPServer on port 8080:

1
$ python -m SimpleHTTPServer 8080

Head back to the previous terminal session and restart prometheus:

1
$ systemctl restart prometheus

Configure the Alarm definition:

Create a alarm definition that desribes that defines when to notify when a endpoint goes down:

1
$ vim /etc/prometheus/alert.rules.yml

And our alert definition:

1
2
3
4
5
6
7
8
9
10
groups:
- name: alert.rules
  rules:
  - alert: EndpointDown
    expr: probe_success == 0
    for: 10s
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint  down"

Ensure that the permission is set:

1
$ chown prometheus:prometheus /etc/prometheus/alert.rules.yml

Use the promtool to validate that the alert is correctly configured:

1
2
3
$ promtool check rules /etc/prometheus/alert.rules.yml
Checking /etc/prometheus/alert.rules.yml
  SUCCESS: 1 rules found

If everything is good, restart prometheus:

1
$ systemctl restart prometheus

Blackbox Exporter Dashboard

To install a blackbox exporter dashboard: https://grafana.com/dashboards/7587, create a new dashboard, select import, provide the ID: 7587, select the prometheus datasource and select save.

The dashboard should look similar to this:

blackbox-exporter

Next up, Alertmanager

In the next tutorial we will setup Alertmanager to alert when our endpoint goes down

Resources

See all #prometheus blogposts

Install Alertmanager to Alert Based on Metrics From Prometheus

prometheus

So we are pushing our time series metrics into prometheus, and now we would like to alarm based on certain metric dimensions. That’s where alertmanager fits in. We can setup targets and rules, once rules for our targets does not match, we can alarm to destinations suchs as slack, email etc.

What we will be doing:

In our previous tutorial we installed blackbox exporter to probe a endpoint. Now we will install Alertmanager and configure an alert to notify us via email and slack when our endpoint goes down. See this post if you have not seen the previous tutorial.

Install Alertmanager

Create the user for alertmanager:

1
$ useradd --no-create-home --shell /bin/false alertmanager

Download alertmanager and extract:

1
2
$ https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz
$ tar -xvf alertmanager-0.17.0.linux-amd64.tar.gz

Move alertmanager and amtool birnaries in place:

1
2
$ cp alertmanager-0.17.0.linux-amd64/alertmanager /usr/local/bin/
$ cp alertmanager-0.17.0.linux-amd64/amtool /usr/local/bin/

Ensure that the correct permissions are in place:

1
2
$ chown alertmanager:alertmanager /usr/local/bin/alertmanager
$ chown alertmanager:alertmanager /usr/local/bin/amtool

Cleanup:

1
$ rm -rf alertmanager-0.17.0*

Configure Alertmanager:

Create the alertmanager directory and configure the global alertmanager configuration:

1
2
$ mkdir /etc/alertmanager
$ vim /etc/alertmanager/alertmanager.yml

Provide the global config and ensure to populate your personal information. See this post to create a slack webhook.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
global:
  smtp_smarthost: 'smtp.domain.net:587'
  smtp_from: 'AlertManager <mailer@domain.com>'
  smtp_require_tls: true
  smtp_hello: 'alertmanager'
  smtp_auth_username: 'username'
  smtp_auth_password: 'password'

  slack_api_url: 'https://hooks.slack.com/services/x/xx/xxx'

route:
  group_by: ['instance', 'alert']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: team-1

receivers:
  - name: 'team-1'
    email_configs:
      - to: 'user@domain.com'
    slack_configs:
      # https://prometheus.io/docs/alerting/configuration/#slack_config
      - channel: 'system_events'
      - username: 'AlertManager'
      - icon_emoji: ':joy:'

Ensure the permissions are in place:

1
$ chown alertmanager:alertmanager -R /etc/alertmanager

Create the alertmanager systemd unit file:

1
$ vim /etc/systemd/system/alertmanager.service

And supply the unit file configuration. Note that I am exposing port 9093 directly as Im not using a reverse proxy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/etc/alertmanager/
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --web.external-url http://0.0.0.0:9093

[Install]
WantedBy=multi-user.target

Now we need to inform prometheus that we will send alerts to alertmanager to it’s exposed port:

1
$ vim /etc/prometheus/prometheus.yml

And supply the alertmanager configuration for prometheus:

1
2
3
4
5
6
7
...
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093
...

So when we get alerted, our alert will include a link to our alert. We need to provide the base url of that alert. That get’s done in our alertmanager systemd unit file: /etc/systemd/system/alertmanager.service under --web.external-url passing the alertmanager base ip address:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/etc/alertmanager/
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --web.external-url http://<your.alertmanager.ip.address>:9093

[Install]
WantedBy=multi-user.target

Then we need to do the same with the prometheus systemd unit file: /etc/systemd/system/prometheus.service under --web.external-url passing the prometheus base ip address:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.external-url http://<your.prometheus.ip.address>

[Install]
WantedBy=multi-user.target

Since we have edited the systemd unit files, we need to reload the systemd daemon:

1
$ systemctl daemon-reload

Then restart prometheus and alertmanager:

1
2
$ systemctl restart prometheus
$ systemctl restart alertmanager

Inspect the status of alertmanager and prometheus:

1
2
$ systemctl status alertmanager
$ systemctl status prometheus

If everything seems good, enable alertmanager on boot:

1
$ systemctl enable alertmanager

Access Alertmanager:

Access alertmanager on your endpoint on port 9093:

alertmanager

From our previous tutorial we started a local web service on port 8080 that is being monitored by prometheus. Let’s stop that service to test out the alerting. You should get a notification via email:

alertmanager

And the notification via slack:

alertmanager

When you start the service again and head over to the prometheus ui under alerts, you will see that the service recovered:

prometheus

Install Prometheus Alertmanager Plugin

Install the Prometheus Alertmanager Plugin in Grafana. Head to the instance where grafana is installed and install the plugin:

1
$ grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource

Once the plugin is installed, restart grafana:

1
$ service grafana-server restart

Install the dasboard grafana.com/dashboards/8010. Create a new datasource, select the prometheus-alertmanager datasource, configure and save.

Add a new dasboard, select import and provide the ID 8010, select the prometheus-alertmanager datasource and save. You should see the following (more or less):

prometheus-alertmanager

Resources

See all #prometheus blogposts

Install Grafana to Visualize Your Metrics From Datasources Such as Prometheus on Linux

image

Grafana is a Open Source Dashboarding service that allows you to monitor, analyze and graph metrics from datasources such as prometheus, influxdb, elasticsearch, aws cloudwatch, and many more.

Not only is grafana amazing, its super pretty!

Example of how a dashboard might look like:

E24B39B1-23C8-44C5-959D-6E6275F8FE99

What are we doing today

In this tutorial we will setup grafana on linux. If you have not set up prometheus, follow this blogpost to install prometheus.

Install Grafana

I will be demonstrating how to install grafana on debian, if you have another operating system, head over to grafana documentation for other supported operating systems.

Get the gpg key:

1
$ curl https://packages.grafana.com/gpg.key | sudo apt-key add -

Import the public keys:

1
$ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys  8C8C34C524098CB6 

Add the latest stable packages to your repository:

1
$ add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"

Install a pre-requirement package:

1
$ apt install apt-transport-https -y

Update the repository index and install grafana:

1
$ apt update && sudo apt install grafana -y

Once grafana is installed, start the service:

1
$ service grafana-server start

Then enable the service on boot:

1
$ update-rc.d grafana-server defaults

If you want to control the service via systemd:

1
2
3
$ systemctl daemon-reload
$ systemctl start grafana-server
$ systemctl status grafana-server

Optional: Nginx Reverse Proxy

If you want to front your grafana instance with a nginx reverse proxy:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cat /etc/nginx/sites-enabled/grafana
server {
    listen 80;
    server_name grafana.domain.com;

    location / {
        proxy_pass http://127.0.0.1:3000/;
        proxy_redirect http://127.0.0.1:3000/ /;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP $remote_addr;
    }

Then restart nginx:

1
$ systemctl restart nginx

Access Grafana

If you are accessing grafana directly, access grafana on http://your-grafana-ip:3000/ and your username is admin and password admin

Dashboarding Tutorials

Have a look at this screencast where the guys from grafana show you how to build dashboards:

Also have a look at their public repository of dashboards

For more tutorials on prometheus and metrics have a look at #prometheus

Install Pushgateway to Expose Metrics to Prometheus

In most cases when we want to scrape a node for metrics, we will install node-exporter on a host and configure prometheus to scrape the configured node to consume metric data. But in certain cases we want to push custom metrics to prometheus. In such cases, we can make use of pushgateway.

Pushgateway allows you to push custom metrics to push gateway’s endpoint, then we configure prometheus to scrape push gateway to consume the exposed metrics into prometheus.

Pre-Requirements

If you have not set up Prometheus, head over to this blogpost to set up prometheus on Linux.

What we will do?

In this tutorial, we will setup pushgateway on linux and after pushgateway has been setup, we will push some custom metrics to pushgateway and configure prometheus to scrape metrics from pushgateway.

Install Pushgateway

Get the latest version of pushgateway from prometheus.io, then download and extract:

1
2
$ wget https://github.com/prometheus/pushgateway/releases/download/v0.8.0/pushgateway-0.8.0.linux-amd64.tar.gz
$ tar -xvf pushgateway-0.8.0.linux-amd64.tar.gz

Create the pushgateway user:

1
$ useradd --no-create-home --shell /bin/false pushgateway

Move the binary in place and update the permissions to the user that we created:

1
2
$ cp pushgateway-0.8.0.linux-amd64/pushgateway /usr/local/bin/pushgateway
$ chown pushgateway:pushgateway /usr/local/bin/pushgateway

Create the systemd unit file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cat > /etc/systemd/system/pushgateway.service << EOF
[Unit]
Description=Pushgateway
Wants=network-online.target
After=network-online.target

[Service]
User=pushgateway
Group=pushgateway
Type=simple
ExecStart=/usr/local/bin/pushgateway \
    --web.listen-address=":9091" \
    --web.telemetry-path="/metrics" \
    --persistence.file="/tmp/metric.store" \
    --persistence.interval=5m \
    --log.level="info" \
    --log.format="logger:stdout?json=true"

[Install]
WantedBy=multi-user.target
EOF

Reload systemd and restart the pushgateway service:

1
2
$ systemctl daemon-reload
$ systemctl restart pushgateway

Ensure that pushgateway has been started:

1
2
3
4
5
6
7
8
9
10
$ systemctl status pushgateway
  pushgateway.service - Pushgateway
   Loaded: loaded (/etc/systemd/system/pushgateway.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-05-07 09:05:57 UTC; 2min 33s ago
 Main PID: 6974 (pushgateway)
    Tasks: 6 (limit: 4704)
   CGroup: /system.slice/pushgateway.service
           └─6974 /usr/local/bin/pushgateway --web.listen-address=:9091 --web.telemetry-path=/metrics --persistence.file=/tmp/metric.store --persistence.interval=5m --log.level=info --log.format=logger:st

May 07 09:05:57 ip-172-31-41-126 systemd[1]: Started Pushgateway.

Configure Prometheus

Now we want to configure prometheus to scrape pushgateway for metrics, then the scraped metrics will be injected into prometheus’s time series database:

At the moment, I have prometheus, node-exporter and pushgateway on the same node so I will provide my complete prometheus configuration, If you are just looking for the pushgateway config, it will be the last line:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ cat /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['localhost:9091']

Restart prometheus:

1
$ systemctl restart prometheus

Push metrics to pushgateway

First we will look at a bash example to push metrics to pushgateway:

1
$ echo "cpu_utilization 20.25" | curl --data-binary @- http://localhost:9091/metrics/job/my_custom_metrics/instance/10.20.0.1:9000/provider/hetzner

Have a look at pushgateway’s metrics endpoint:

1
2
3
$ curl -L http://localhost:9091/metrics/
# TYPE cpu_utilization untyped
cpu_utlization{instance="10.20.0.1:9000",job="my_custom_metrics",provider="hetzner"} 20.25

Let’s look at a python example on how we can push metrics to pushgateway:

1
2
3
4
5
6
7
8
9
10
import requests

job_name='my_custom_metrics'
instance_name='10.20.0.1:9000'
provider='hetzner'
payload_key='cpu_utilization'
payload_value='21.90'

response = requests.post('http://localhost:9091/metrics/job/{j}/instance/{i}/team/{t}'.format(j=job_name, i=instance_name, t=team_name), data='{k} {v}\n'.format(k=payload_key, v=payload_value))
print(response.status_code)

With this method, you can push any custom metrics (bash, lambda function, etc) to pushgateway and allow prometheus to consume that data into it’s time series database.

Resources:

See #prometheus for more posts on Prometheus

Running a HA MySQL Galera Cluster on Docker Swarm

image

In this post we will setup a highly available mysql galera cluster on docker swarm.

About

The service is based of docker-mariadb-cluster repository and it’s designed not to have any persistent data attached to the service, but rely on the “nodes” to replicate the data.

Note, that however this proof of concept works, I always recommend to use a remote mysql database outside your cluster, such as RDS etc.

Since we don’t persist any data on the mysql cluster, I have associated a dbclient service that will run continious backups, which we will persist the path where the backups reside to disk.

Deploy the MySQL Cluster

The docker-compose.yml that we will use looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
version: '3.5'
services:
  dbclient:
    image: alpine
    environment:
      - BACKUP_ENABLED=1
      - BACKUP_INTERVAL=3600
      - BACKUP_PATH=/data
      - BACKUP_FILENAME=db_backup
    networks:
      - dbnet
    entrypoint: |
      sh -c 'sh -s << EOF
      apk add --no-cache mysql-client
      while true
        do
          if [ $$BACKUP_ENABLED == 1 ]
            then
              sleep $$BACKUP_INTERVAL
              mkdir -p $$BACKUP_PATH/$$(date +%F)
              echo "$$(date +%FT%H.%m) - Making Backup to : $$BACKUP_PATH/$$(date +%F)/$$BACKUP_FILENAME-$$(date +%FT%H.%m).sql.gz"
              mysqldump -u root -ppassword -h dblb --all-databases | gzip > $$BACKUP_PATH/$$(date +%F)/$$BACKUP_FILENAME-$$(date +%FT%H.%m).sql.gz
              find $$BACKUP_PATH -mtime 7 -delete
          fi
        done
      EOF'
    volumes:
      - vol_dbclient:/data
    deploy:
      mode: replicated
      replicas: 1

  dbcluster:
    image: toughiq/mariadb-cluster
    networks:
      - dbnet
    environment:
      - DB_SERVICE_NAME=dbcluster
      - MYSQL_ROOT_PASSWORD=password
      - MYSQL_DATABASE=mydb
      - MYSQL_USER=mydbuser
      - MYSQL_PASSWORD=mydbpass
    deploy:
      mode: replicated
      replicas: 1

  dblb:
    image: toughiq/maxscale
    networks:
      - dbnet
    ports:
      - 3306:3306
    environment:
      - DB_SERVICE_NAME=dbcluster
      - ENABLE_ROOT_USER=1
    deploy:
      mode: replicated
      replicas: 1

volumes:
  vol_dbclient:
    driver: local

networks:
  dbnet:
    name: dbnet
    driver: overlay

The dbclient is configured to be in the same network as the cluster so it can reach the mysql service. The default behavior is that it will make a backup every hour (3600 seconds) to the /data/{date}/ path.

Deploy the stack:

1
2
3
4
5
$ docker stack deploy -c docker-compose.yml galera
Creating network dbnet
Creating service galera_dbcluster
Creating service galera_dblb
Creating service galera_dbclient

Have a look to see if all the services is running:

1
2
3
4
5
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
jm7p70qre72u        galera_dbclient     replicated          1/1                 alpine:latest
p8kcr5y7szte        galera_dbcluster    replicated          1/1                 toughiq/mariadb-cluster:latest
1hu3oxhujgfm        galera_dblb         replicated          1/1                 toughiq/maxscale:latest          :3306->3306/tcp

The Backup Client

As mentioned the backup client backs up to the /data/ path:

1
2
3
4
$ docker exec -it $(docker ps -f name=galera_dbclient -q) find /data/
/data/
/data/2019-05-10
/data/2019-05-10/db_backup-2019-05-10T10.05.sql.gz

Let’s go ahead and populate some data into our mysql database:

1
2
3
4
$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb
MySQL [(none)]> create table mydb.foo (name varchar(10));
MySQL [(none)]> insert into mydb.foo values('ruan');
MySQL [(none)]> exit

Scale the Cluster

At the moment we only have 1 replica for our mysql cluster, let’s go ahead and scale the cluster to 3 replicas:

1
2
3
4
5
6
7
$ docker service scale galera_dbcluster=3
galera_dbcluster scaled to 3
overall progress: 3 out of 3 tasks
1/3: running   [==================================================>]
2/3: running   [==================================================>]
3/3: running   [==================================================>]
verify: Service converged

Verify that the service has been scaled:

1
2
3
4
5
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
jm7p70qre72u        galera_dbclient     replicated          1/1                 alpine:latest
p8kcr5y7szte        galera_dbcluster    replicated          3/3                 toughiq/mariadb-cluster:latest
1hu3oxhujgfm        galera_dblb         replicated          1/1                 toughiq/maxscale:latest          :3306->3306/tcp

Test, by reading from the database:

1
2
3
4
5
6
$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

Simulate a Node Failure:

Simulate a node failure by killing one of the mysql containers:

1
$ docker kill 9e336032ab52

Verify that one container is missing from our service:

1
2
3
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
p8kcr5y7szte        galera_dbcluster    replicated          2/3                 toughiq/mariadb-cluster:latest

While the container is provisioning, as we have 2 out of 3 running containers, read the data 3 times so test that the round robin queries dont hit the affected container (the dblb wont route traffic to the affected container):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

Verify that the 3rd container has checked in:

1
2
3
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
p8kcr5y7szte        galera_dbcluster    replicated          3/3                 toughiq/mariadb-cluster:latest

How to Restore?

I’m deleting the database to simulate the scenario where we need to restore:

1
2
$ docker exec -it $(docker ps -f name=galera_dbclient -q) sh
> mysql -uroot -ppassword -h dblb -e'drop database mydb;'

Ensure the db is not present:

1
2
> mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
ERROR 1146 (42S02) at line 1: Table 'mydb.foo' doesn't exist

Find the archive and extract:

1
2
3
4
5
6
> find /data/
/data/
/data/2019-05-10
/data/2019-05-10/db_backup-2019-05-10T10.05.sql.gz

> gunzip /data/2019-05-10/db_backup-2019-05-10T10.05.sql.gz

Restore the backed up database to MySQL:

1
> mysql -uroot -ppassword -h dblb < /data/2019-05-10/db_backup-2019-05-10T10.05.sql

Test that we can read our data:

1
2
3
4
5
6
> mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

Create Secrets With Vaults Transits Secret Engine

Vault’s transit secrets engine handles cryptographic functions on data-in-transit. Vault doesn’t store the data sent to the secrets engine, so it can also be viewed as encryption as a service.

In this tutorial we will demonstrate how to use Vault’s Transit Secret Engine.

Related Posts:

Enable the Transit Engine:

Enable transit secret engine using the /sys/mounts endpoint:

1
$ curl --header "X-Vault-Token: $VAULT_TOKEN" -XPOST -d '{"type": "transit", "description": "encs encryption"}' http://127.0.0.1:8200/v1/sys/mounts/transit

Create the Key Ring:

Create an encryption key ring named fookey using the transit/keys endpoint:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ curl -s --header "X-Vault-Token: $VAULT_TOKEN" -XGET http://127.0.0.1:8200/v1/transit/keys/fookey | jq
{
  "request_id": "8375227a-4a9f-a108-0b89-84c448419e80",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "allow_plaintext_backup": false,
    "deletion_allowed": false,
    "derived": false,
    "exportable": false,
    "keys": {
      "1": 1554654295
    },
    "latest_version": 1,
    "min_available_version": 0,
    "min_decryption_version": 1,
    "min_encryption_version": 0,
    "name": "fookey",
    "supports_decryption": true,
    "supports_derivation": true,
    "supports_encryption": true,
    "supports_signing": false,
    "type": "aes256-gcm96"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Encoding

Encode your string:

1
2
$ base64 <<< "hello world"
aGVsbG8gd29ybGQK

Encrypt

To encrypt your secret, use the transit/encrypt endpoint:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl -s --header "X-Vault-Token: $VAULT_TOKEN" --request POST  --data '{"plaintext": "aGVsbG8gd29ybGQK"}' http://127.0.0.1:8200/v1/transit/encrypt/fookey | jq
{
  "request_id": "ab00ba0f-9e45-0aca-e3c1-7765fd83fc3c",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "ciphertext": "vault:v1:Yo4U6xXFM2FoBOaUrw0w3EpSlJS6gmsa4HP1xKtjrk0+xSqi5Rvjvg=="
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Decrypt:

Use the transit/decrypt endpoint to decrypt the ciphertext:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl -s --header "X-Vault-Token: $VAULT_TOKEN" --request POST  --data '{"ciphertext": "vault:v1:Yo4U6xXFM2FoBOaUrw0w3EpSlJS6gmsa4HP1xKtjrk0+xSqi5Rvjvg=="}' http://127.0.0.1:8200/v1/transit/decrypt/fookey | jq
{
  "request_id": "3d9743a0-2daf-823c-f413-8c8a90753479",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "plaintext": "aGVsbG8gd29ybGQK"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Decoding

Decode the response:

1
2
$ base64 --decode <<< "aGVsbG8gd29ybGQK"
hello world

Resources

Use the Vault API to Provision App Keys and Create KV Pairs

In this tutorial we will use Vault API to create a user and allow that user to write/read key/value pairs from a given path.

Related Posts:

Credentials / Authentication

Export Vault Root Tokens:

1
2
$ export ROOT_TOKEN="$(cat ~/.vault-token)"
$ export VAULT_TOKEN=${ROOT_TOKEN}

Check the vault status:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/sys/health | jq
{
  "initialized": true,
  "sealed": false,
  "standby": false,
  "performance_standby": false,
  "replication_performance_mode": "disabled",
  "replication_dr_mode": "disabled",
  "server_time_utc": 1554652468,
  "version": "1.1.0",
  "cluster_name": "vault-cluster-bfb00cd7",
  "cluster_id": "dc1dc9a6-xx-xx-xx-a2870f475e7a"
}

Do a lookup for the root user:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/auth/token/lookup-self | jq
{
  "request_id": "69a19f66-5bad-3af2-81a5-81ca24e50b02",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "accessor": "A7Xkik1ebWpUfzqNrvADmQ08",
    "creation_time": 1554651149,
    "creation_ttl": 0,
    "display_name": "root",
    "entity_id": "",
    "expire_time": null,
    "explicit_max_ttl": 0,
    "id": "s.po8HkMdCnnAerlCAeHGGGszQ",
    "meta": null,
    "num_uses": 0,
    "orphan": true,
    "path": "auth/token/root",
    "policies": [
      "root"
    ],
    "ttl": 0,
    "type": "service"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Create the Roles

Create the AppRole:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
$ curl -s -XPOST -H "X-Vault-Token: ${VAULT_TOKEN}" -d '{"type": "approle"}' http://127.0.0.1:8200/v1/sys/auth/approle | jq
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/sys/auth | jq
{
  "token/": {
    "accessor": "auth_token_31f2381e",
    "config": {
      "default_lease_ttl": 0,
      "force_no_cache": false,
      "max_lease_ttl": 0,
      "token_type": "default-service"
    },
    "description": "token based credentials",
    "local": false,
    "options": null,
    "seal_wrap": false,
    "type": "token"
  },
  "approle/": {
    "accessor": "auth_approle_d542dcad",
    "config": {
      "default_lease_ttl": 0,
      "force_no_cache": false,
      "max_lease_ttl": 0,
      "token_type": "default-service"
    },
    "description": "",
    "local": false,
    "options": {},
    "seal_wrap": false,
    "type": "approle"
  },
  "request_id": "20554948-b8e0-4254-f21d-f9ad25f1e5d5",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "approle/": {
      "accessor": "auth_approle_d542dcad",
      "config": {
        "default_lease_ttl": 0,
        "force_no_cache": false,
        "max_lease_ttl": 0,
        "token_type": "default-service"
      },
      "description": "",
      "local": false,
      "options": {},
      "seal_wrap": false,
      "type": "approle"
    },
    "token/": {
      "accessor": "auth_token_31f2381e",
      "config": {
        "default_lease_ttl": 0,
        "force_no_cache": false,
        "max_lease_ttl": 0,
        "token_type": "default-service"
      },
      "description": "token based credentials",
      "local": false,
      "options": null,
      "seal_wrap": false,
      "type": "token"
    }
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Create the test policy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ curl -XPOST -H "X-Vault-Token: ${VAULT_TOKEN}" -d '{"policy": "{\"name\": \"test\", \"path\": {\"secret/*\": {\"policy\": \"write\"}}}"}' http://127.0.0.1:8200/v1/sys/policy/test
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/sys/policy/test | jq
{
  "name": "test",
  "rules": "{\"name\": \"test\", \"path\": {\"secret/*\": {\"policy\": \"write\"}}}",
  "request_id": "e4f55dc0-575f-ead9-48f6-43154153889a",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "name": "test",
    "rules": "{\"name\": \"test\", \"path\": {\"secret/*\": {\"policy\": \"write\"}}}"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Attach the policy to the approle:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ curl -XPOST -H "X-Vault-Token: ${VAULT_TOKEN}" -d '{"policies": "test"}' http://127.0.0.1:8200/v1/auth/approle/role/app
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" 'http://127.0.0.1:8200/v1/auth/approle/role?list=true' | jq .
{
  "request_id": "e645cad9-9010-4299-0e6b-0baf6d9194b8",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "keys": [
      "app"
    ]
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Enable the kv store:

1
$ curl -H "X-Vault-Token: ${VAULT_TOKEN}" -XPOST --data '{"type": "kv", "description": "my key value store", "config": {"force_no_cache": true}}' http://127.0.0.1:8200/v1/sys/mounts/secret

Create the User Credentials

Get the role_id:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/auth/approle/role/app/role-id | jq
{
  "request_id": "e803a1bf-a492-dad7-68db-bb1506752e03",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "role_id": "3e365c72-7aad-f4e4-521c-d7cf0dd83c0f"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Create the secret_id:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ curl -s -XPOST -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/auth/approle/role/app/secret-id | jq
{
  "request_id": "b56d20c0-ff8a-a1fe-4d5f-42e57b625b83",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "secret_id": "5eecfe29-d6e1-50e6-7a70-04c6bea42b76",
    "secret_id_accessor": "2fa80586-32b9-1c6f-fe1d-7c547e5403e5"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Create the token with the role_id and secret_id:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ curl -s -XPOST -d '{"role_id": "3e365c72-7aad-f4e4-521c-d7cf0dd83c0f","secret_id": "5eecfe29-d6e1-50e6-7a70-04c6bea42b76"}' http://127.0.0.1:8200/v1/auth/approle/login | jq
{
  "request_id": "82470940-ef09-bcbb-f7a0-bdf085b4f47b",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": null,
  "auth": {
    "client_token": "s.7EtwtRGsZWOtkqcMvj3UMLP0",
    "accessor": "2TPL1vg5IZXgVF6Xf1RRzbmL",
    "policies": [
      "default",
      "test"
    ],
    "token_policies": [
      "default",
      "test"
    ],
    "metadata": {
      "role_name": "app"
    },
    "lease_duration": 2764800,
    "renewable": true,
    "entity_id": "d5051b01-b7ce-626c-a9f4-e1663f8c23e8",
    "token_type": "service",
    "orphan": true
  }
}

Create KV Pairs with New User

Export the user auth with the received token:

1
2
$ export APP_TOKEN=s.7EtwtRGsZWOtkqcMvj3UMLP0
$ export VAULT_TOKEN=$APP_TOKEN

Verify if you can lookup your own info:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/auth/token/lookup-self | jq
{
  "request_id": "2e69cd68-8668-3159-6440-c430cb61d2e6",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "accessor": "2TPL1vg5IZXgVF6Xf1RRzbmL",
    "creation_time": 1554651882,
    "creation_ttl": 2764800,
    "display_name": "approle",
    "entity_id": "d5051b01-b7ce-626c-a9f4-e1663f8c23e8",
    "expire_time": "2019-05-09T15:44:42.1013993Z",
    "explicit_max_ttl": 0,
    "id": "s.7EtwtRGsZWOtkqcMvj3UMLP0",
    "issue_time": "2019-04-07T15:44:42.1013788Z",
    "meta": {
      "role_name": "app"
    },
    "num_uses": 0,
    "orphan": true,
    "path": "auth/approle/login",
    "policies": [
      "default",
      "test"
    ],
    "renewable": true,
    "ttl": 2764556,
    "type": "service"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Create a KV pair:

1
$ curl -s -XPOST -H "X-Vault-Token: ${VAULT_TOKEN}" -d '{"app_password": "secret123"}' http://127.0.0.1:8200/v1/secret/app01/app_password

Read the secret from KV pair:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl -s -XGET -H "X-Vault-Token: ${VAULT_TOKEN}" http://127.0.0.1:8200/v1/secret/app01/app_password | jq
{
  "request_id": "70d5f16d-2abb-fcfd-063f-0e21d9cef8fd",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 2764800,
  "data": {
    "app_password": "secret123"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Try to write outside the allowed path:

1
2
$ curl -s -XPOST -H "X-Vault-Token: ${VAULT_TOKEN}" -d '{"app_password": "secret123"}' http://127.0.0.1:8200/v1/secrets/app01/app_password
{"errors":["1 error occurred:\n\t* permission denied\n\n"]}

Resources:

Persist Vault Data With Amazon S3 as a Storage Backend

In a previous post we have set up the vault server on docker, but using a file backend to persist our data.

In this tutorial we will configure vault to use amazon s3 as a storage backend to persist our data for vault.

Provision S3 Bucket

Create the S3 Bucket where our data will reside:

1
$ aws s3 mb --region=eu-west-1 s3://somename-vault-backend

Vault Config

Create the vault config, where we will provide details about our storage backend and configuration for the vault server:

1
$ vim volumes/config/s3vault.json

Populate the config file with the following details, you will just need to provide your own credentials:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "backend": {
    "s3": {
      "region": "eu-west-1",
      "access_key": "ACCESS_KEY",
      "secret_key": "SECRET_KEY",
      "bucket": "somename-vault-backend"
    }
  },
  "listener": {
    "tcp":{
      "address": "0.0.0.0:8200",
      "tls_disable": 1
    }
  },
  "ui": true
}

Docker Compose

As we are using docker to deploy our vault server, our docker-compose.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ cat > docker-compose.yml << EOF
version: '2'
services:
  vault:
    image: vault
    container_name: vault
    ports:
      - "8200:8200"
    restart: always
    volumes:
      - ./volumes/logs:/vault/logs
      - ./volumes/file:/vault/file
      - ./volumes/config:/vault/config
    cap_add:
      - IPC_LOCK
    entrypoint: vault server -config=/vault/config/s3vault.json
EOF

Deploy the vault server:

1
$ docker-compose up

Go ahead and create some secrets, then deploy the docker container on another host to test out the data persistence.

Setup Prometheus and Node Exporter on Ubuntu for Epic Monitoring

image

Prometheus is one of those awesome open source monitoring services that I simply cannot live without. Prometheus is a Time Series Database that collects metrics from services using it’s exporters functionality. Prometheus has its own query language called PromQL and makes graphing epic visualiztions with services such as Grafana a breeze.

What are we doing today

We will install the prometheus service and set up node_exporter to consume node related metrics such as cpu, memory, io etc that will be scraped by the exporter configuration on prometheus, which then gets pushed into prometheus’s time series database. Which can then be used by services such as Grafana to visualize the data.

Other exporters is also available, such as: haproxy_exporter, blackbox_exporter etc, then you also get pushgateway which is used to push data to, and then your exporter configuration scrapes the data from the pushgateway endpoint. In a later tutorial, we will set up push gateway as well.

Install Prometheus

First, let’s provision our dedicated system users for prometheus and node exporter:

1
2
$ useradd --no-create-home --shell /bin/false prometheus
$ useradd --no-create-home --shell /bin/false node_exporter

Create the directories for it’s system files:

1
2
$ mkdir /etc/prometheus
$ mkdir /var/lib/prometheus

Apply the permissions:

1
2
$ chown prometheus:prometheus /etc/prometheus
$ chown prometheus:prometheus /var/lib/prometheus

Next, update your system:

1
$ apt update && apt upgrade -y

Let’s install prometheus, head over to https://prometheus.io/download/ and get the latest version of prometheus:

1
2
3
4
5
6
7
8
9
10
11
$ wget https://github.com/prometheus/prometheus/releases/download/v2.8.0/prometheus-2.8.0.linux-amd64.tar.gz
$ tar -xf prometheus-2.8.0.linux-amd64.tar.gz
$ cp prometheus-2.8.0.linux-amd64/prometheus /usr/local/bin/
$ cp prometheus-2.8.0.linux-amd64/promtool /usr/local/bin/
$ chown prometheus:prometheus /usr/local/bin/prometheus
$ chown prometheus:prometheus /usr/local/bin/promtool
$ cp -r prometheus-2.8.0.linux-amd64/consoles /etc/prometheus/
$ cp -r prometheus-2.8.0.linux-amd64/console_libraries /etc/prometheus/
$ chown -R prometheus:prometheus /etc/prometheus/consoles
$ chown -R prometheus:prometheus /etc/prometheus/console_libraries
$ rm -rf prometheus-2.8.0.linux-amd64*

Configure Prometheus

We need to tell prometheus to scrape itself in order to get prometheus performance data, edit the prometheus configuration:

1
$ vim /etc/prometheus/prometheus.yml

And add a scrape config: Set the interval on when it needs to scrap, the job name which will be in your metric and the endpoint which it needs to scrape:

1
2
3
4
5
6
7
8
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

Apply permissions to the configured file:

1
$ chown prometheus:prometheus /etc/prometheus/prometheus.yml

Next, we need to define a systemd unit file so we can control the daemon using systemd:

1
$ vim /etc/systemd/system/prometheus.service

The config:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target

Since we created a new systemd unit file, we need to reload the systemd daemon, then start the service:

1
2
$ systemctl daemon-reload
$ systemctl start prometheus

Let’s look at the status to see if everything works as expected:

1
2
3
4
5
6
7
8
9
10
11
$ systemctl status prometheus
prometheus.service - Prometheus
   Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-03-26 11:59:10 UTC; 6s ago
 Main PID: 16374 (prometheus)
    Tasks: 9 (limit: 4704)
   CGroup: /system.slice/prometheus.service
           └─16374 /usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=

...
Mar 26 11:59:10 ip-172-31-41-126 prometheus[16374]: level=info ts=2019-03-26T11:59:10.893770598Z caller=main.go:655 msg="TSDB started"

Seems legit! Enable the service on startup:

1
$ systemctl enable prometheus

Install Node Exporter

Now since we have prometheus up and running, we can start adding exporters to publish data into our prometheus time series database. As mentioned before, with node exporter, we will allow prometheus to scrape the node exporter endpoint to consume metrics about the node:

You will find the latest version from their website, which I have added at the top of this post.

1
2
3
4
5
$ wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
$ tar -xf node_exporter-0.17.0.linux-amd64.tar.gz
$ cp node_exporter-0.17.0.linux-amd64/node_exporter /usr/local/bin
$ chown node_exporter:node_exporter /usr/local/bin/node_exporter
$ rm -rf node_exporter-0.17.0.linux-amd64*

Create the systemd unit file:

1
$ vim /etc/systemd/system/node_exporter.service

Apply this configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Reload the systemd daemon and start node exporter:

1
2
$ systemctl daemon-reload
$ systemctl start node_exporter

Look at the status:

1
2
3
4
5
6
7
8
9
10
$ node_exporter.service - Node Exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-03-26 12:01:39 UTC; 5s ago
 Main PID: 16474 (node_exporter)
    Tasks: 4 (limit: 4704)
   CGroup: /system.slice/node_exporter.service
           └─16474 /usr/local/bin/node_exporter

...
Mar 26 12:01:39 ip-172-31-41-126 node_exporter[16474]: time="2019-03-26T12:01:39Z" level=info msg="Listening on :9100" source="node_exporter.go:111"

If everything looks good, enable the service on boot:

1
$ systemctl enable node_exporter

Configure Node Exporter

Now that we have node exporter running, we need to tell prometheus how to scrape node exporter, so that the node related metrics can end up into prometheus. Edit the prometheus config:

1
$ vim /etc/prometheus/prometheus.yml

I’m providing the full config, but the config is the last section, where you can see the jobname is node_exporter:

1
2
3
4
5
6
7
8
9
10
11
12
13
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']

Once the config is saved, restart prometheus and have a look at the status if everything is going as expected:

1
2
$ systemctl restart prometheus
$ systemctl status prometheus

Nginx Reverse Proxy

Let’s add a layer of security and front our setup with a nginx reverse proxy, so that we don’t have to access prometheus on high ports and we have the option to enable basic http authentication. Install nginx:

1
$ apt install nginx apache2-utils -y

Create the authentication file:

1
$ htpasswd -c /etc/nginx/.htpasswd admin

Create the nginx site configuration, this will tel nginx to route connections on port 80, to reverse proxy to localhost, port 9090, if authenticated:

1
2
$ rm /etc/nginx/sites-enabled/default
$ vim /etc/nginx/sites-enabled/prometheus.conf

And this is the config:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
server {
    listen 80 default_server;
    listen [::]:80 default_server;
    root /var/www/html;
    index index.html index.htm index.nginx-debian.html;
    server_name _;


    location / {
            auth_basic "Prometheus Auth";
            auth_basic_user_file /etc/nginx/.htpasswd;
            proxy_pass http://localhost:9090;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection 'upgrade';
            proxy_set_header Host $host;
            proxy_cache_bypass $http_upgrade;
        }
}

Reload nginx configuration:

1
$ systemctl reload nginx

Access the Beauty of Prometheus Land!

Once you have authenticated, head over to status, here you will see status info such as your targets, this wil be the endpoints that prometheus is scraping:

image

From the main screen, let’s dive into some queries using PromQL. Also see my Prometheus Cheatsheet.

For the first query, we want to see the available memory of this node in bytes (node_memory_MemAvailable_bytes):

image

Now since the value is in bytes, let’s convert the value to MB, (node_memory_MemAvailable_bytes/1024/1024)

image

Let’s say we want to see the average memory available in 5 minute buckets:

image

That’s a few basic ones, but feel free to checkout my Prometheus Cheatsheet for other examples. I update them as I find more queries.

Thanks

Hope this was informative. I am planning to publish a post on visualizing prometheus data with Grafana (which is EPIC!) and installing Pushgateway for custom integrations.