Ruan Bekker's Blog

From a Curious mind to Posts on Github

Testing Out Scaleways Kapsule Their Kubernetes as a Service Offering

At this time of writing (2019.06.10) Scaleway’s Kubernetes as a Service, named Kapsule is in Private Beta and got access and pretty stoked on how easy it is to provision a Kubernetes cluster.

What are we doing today?

In this tutorial I will show you how easy it is to provision a 3 node Kubernetes Cluster on Scaleway. In the upcoming tutorial, I will create traefik as an ingress controller and deploy applications to our cluster. Github Repo Version available for now

Provision a Kapsule Cluster

Head over to Kapsule and provision a Kubernetes Cluster:

At this point in time, I will only create a one node “cluster”, as I want to show how to add pools after the intial creation.

After the cluster has been provisioned, you will get information about your endpoints from the Cluster Infromation Section, which we will need for our ingresses:

Scroll down to download your config:

Move your config in place:

1
$ mv ~/Downloads/kubeconfig-k8s-mystifying-torvalds.yaml ~/.kube/config

Interact with your Cluster

Test the connection by getting the info of your nodes in your kubernetes cluster:

1
2
3
$ kubectl get node
NAME                                             STATUS    ROLES     AGE       VERSION
scw-k8s-mystifying-torvalds-default-7f263aabab   Ready     <none>    4m        v1.14.1

Add more nodes:

Provision another pool with 2 more nodes in our cluster:

After the pool has been provisioned, verified that they have joined the cluster:

1
2
3
4
5
$ kubectl get nodes
NAME                                             STATUS    ROLES     AGE       VERSION
scw-k8s-mystifying-torvald-jovial-mclar-25a942   Ready     <none>    2m        v1.14.1
scw-k8s-mystifying-torvald-jovial-mclar-eaf1a2   Ready     <none>    2m        v1.14.1
scw-k8s-mystifying-torvalds-default-7f263aabab   Ready     <none>    15m       v1.14.1

Master / Node Capabilities

Usually, I will label master nodes as master: node-role.kubernetes.io/master and worker nodes as nodes: node-role.kubernetes.io/node to allow container scheduling only on the worker nodes. But Scaleway manages this on their end and when you list your nodes, the nodes that you see are your “worker” nodes.

The master nodes are managed by Scaleway.

Well Done Scaleway

Just one more reason I really love Kapsule. Simplicity at its best, well done to Scaleway. I hope most of the people got access to private beta, but if not, im pretty sure they will keep the public informed on public release dates.

Setup a Logstash Server for Amazon Elasticsearch Service and Auth With IAM

logstash

As many of you might know, when you deploy a ELK stack on Amazon Web Services, you only get E and K in the ELK stack, which is Elasticsearch and Kibana. Here we will be dealing with Logstash on EC2.

What will we be doing

In this tutorial we will setup a Logstash Server on EC2, setup a IAM Role and Autenticate Requests to Elasticsearch with an IAM Role, setup Nginx so that logstash can ship logs to Elasticsearch.

I am not fond of working with access key’s and secret keys, and if I can stay away from handling secret information the better. So instead of creating a access key and secret key for logstash, we will instead create a IAM Policy that will allow the actions to Elasticsearch, associate that policy to an IAM Role, set EC2 as a trusted entity and strap that IAM Role to the EC2 Instance.

Then we will allow the IAM Role ARN to the Elasticsearch Policy, then when Logstash makes requests against Elasticsearch, it will use the IAM Role to assume temporary credentials to authenticate. That way we don’t have to deal with keys. But I mean you can create access keys if that is your preferred method, I’m just not a big fan of keeping secret keys.

The benefit of authenticating with IAM, allows you to remove a reverse proxy that is another hop to the path of your target.

Create the IAM Policy:

Create a IAM Policy that will allow actions to Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "es:ESHttpHead",
                "es:ESHttpPost",
                "es:ESHttpGet",
                "es:ESHttpPut"
            ],
            "Resource": "arn:aws:es:eu-west-1:0123456789012:domain/my-es-domain"
        }
    ]
}

Create Role logstash-system-es with “ec2.amazonaws.com” as trusted entity in trust the relationship and associate the above policy to the role.

Authorize your Role in Elasticsearch Policy

Head over to your Elasticsearch Domain and configure your Elasticsearch Policy to include your IAM Role to grant requests to your Domain:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::0123456789012:role/logstash-system-es"
        ]
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:eu-west-1:0123456789012:domain/my-es-domain/*"
    }
  ]
}

Install Logstash on EC2

I will be using Ubuntu Server 18. Update the repositories and install dependencies:

1
2
3
4
5
$ apt update && apt upgrade -y
$ apt install build-essential apt-transport-https -y
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
$ apt update

As logstash requires Java, install the the Java OpenJDK Runtime Environment:

1
$ apt install default-jre -y

Verify that Java is installed:

1
2
3
4
$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)
OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)

Now, install logstash and enable the service on boot:

1
2
3
$ apt install logstash -y
$ systemctl enable logstash.service
$ service logstash stop

Install the Amazon ES Logstash Output Plugin

For us to be able to authenticate using IAM, we should use the Amazon-ES Logstash Output Plugin. Update and install the plugin:

1
2
$ /usr/share/logstash/bin/logstash-plugin update
$ /usr/share/logstash/bin/logstash-plugin install logstash-output-amazon_es

Configure Logstash

I like to split up my configuration in 3 parts, (input, filter, output).

Let’s create the input configuration: /etc/logstash/conf.d/10-input.conf

1
2
3
4
5
6
input {
  file {
    path => "/var/log/nginx/access.log"
    start_position => "beginning"
  }
}

Our filter configuration: /etc/logstash/conf.d/20-filter.conf

1
2
3
4
5
6
7
8
9
10
filter {
  grok {
    match => { "message" => "%{HTTPD_COMMONLOG}" }
  }
  mutate {
    add_field => {
      "custom_field1" => "hello from: %{host}"
    }
  }
}

And lastly, our output configuration: /etc/logstash/conf.d/30-outputs.conf:

1
2
3
4
5
6
7
8
9
output {
  amazon_es {
      hosts => ["my-es-domain.abcdef.eu-west-1.es.amazonaws.com"]
      index => "new-logstash-%{+YYYY.MM.dd}"
      region => "eu-west-1"
      aws_access_key_id => ''
      aws_secret_access_key => ''
  }
}

Note that the aws_ directives has been left empty as that seems to be the way it needs to be set when using roles. Authentication will be assumed via the Role which is associated to the EC2 Instance.

If you are using access keys, you can populate them there.

Start Logstash

Start logstash:

1
$ service logstash start

Tail the logs to see if logstash starts up correctly, it should look more or less like this:

1
2
3
4
5
6
$ tail -f /var/log/logstash/logstash-plain.log

[2019-06-04T16:38:12,087][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.8.0"}
[2019-06-04T16:38:14,480][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-06-04T16:38:15,226][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://search-my-es-domain-xx.eu-west-1.es.amazonaws.com:443/]}}
[2019-06-04T16:38:15,234][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>https://search-my-es-domain-xx.eu-west-1.es.amazonaws.com:443/, :path=>"/"}

Install Nginx

As you noticed, I have specified /var/log/nginx/access.log as my input file for logstash, as we will test logstash by shipping nginx access logs to Elasticsearch Service.

Install Nginx:

1
$ apt install nginx -y

Start the service:

1
2
$ systemctl restart nginx 
$ systemctl enable nginx

Make a GET request on your Nginx Web Server and inspect the log on Kibana, where it should look like this:

Use Vagrant to Setup a Local Development Environment on Linux

vagrant

Vagrant! Another super product from Hashicorp.

Vagrant makes it really easy to provision virtual servers, which they refer as “boxes”, that enables developers to run their jobs/tasks/applications in a really easy and fast way. Vagrant utilizes a declarative configuration model, so you can describe which OS you want, bootstrap them with installation instructions as soon as it boots, etc.

What are we doing today?

When completing this tutorial, you should be able to launch a Ubuntu Virtual Server locally with Vagrant and using the Virtualbox Provider which will be responsible for running our VM’s.

I am running this on a Ubuntu 19 Desktop, but you can run this on Mac/Windows/Linux. First we will install Virtualbox, then Vagrant, then we will provision a Ubuntu box and I will also show how to inject shell commands into your Vagrantfile so that you can provision software to your VM, and also forward traffic to a web server through your host to the guest.

Virtualbox

Install some pre-requirements:

1
$ sudo apt-get install dkms build-essential linux-headers-`uname -r`

Head over to Virtualbox’s download page and grab the latest version of virtualbox and install it.

After the installation run vboxconfig to build the kernel modules. If you get the error that I received as seen below:

1
2
3
4
5
6
$ sudo /sbin/vboxconfig

vboxdrv.sh: Building VirtualBox kernel modules
vboxdrv.sh: Starting VirtualBox services
vboxdrv.sh: Building VirtualBox kernel modules
vboxdrv.sh: failed: modprobe vboxdrv failed. Please use 'dmesg' to find out why

This resource on askubuntu.com helped me out. In short, theres a requirement that all the kernel modules must be signed by a key trusted by the UEFI system.

To resolve:

1
2
3
4
5
6
7
$ sudo apt-get install linux-headers-generic build-essential dkms
$ sudo apt-get remove --purge virtualbox-dkms
$ sudo apt-get install virtualbox-dkms

$ openssl req -new -x509 -newkey rsa:2048 -keyout MOK.priv -outform DER -out MOK.der -nodes -days 36500 -subj "/CN=Descriptive common name/"
$ sudo /usr/src/linux-headers-$(uname -r)/scripts/sign-file sha256 ./MOK.priv ./MOK.der $(modinfo -n vboxdrv)
$ sudo mokutil --import MOK.der

Remember the password, as you will require it when you reboot. You will get the option to “Enroll MOK”, select that, enter the initial password and reboot.

1
$ sudo reboot

You should be able to get a response from the binary:

1
2
$ VirtualBox -h
Oracle VM VirtualBox VM Selector v6.0.6_Ubuntu

Install Vagrant

Head over to Vagrant’s installation page, get the latest version for your operating system and install it.

After installing it you should get the following response:

1
2
$ vagrant --version
Vagrant 2.2.4

Provision a Box with Vagrant

When you head over to app.vagrantup.com/boxes/search you can select the pre-packed operating system of your choice. As for this demonstration, I went with: ubuntu/trusty64

First we will need to initialize a new Vagrant environment by creating a Vagrantfile, as we will be passing the name of our operating system, it will be populated in our Vagrantfile:

1
2
3
4
5
6
$ vagrant init ubuntu/trusty64

A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.

Now since the Vagrantfile has been placed in our current working directory, let’s have a look at it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cat Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/trusty64"
  # config.vm.network "forwarded_port", guest: 80, host: 8080
  # config.vm.network "forwarded_port", guest: 80, host: 8080, host_ip: "127.0.0.1"
  # config.vm.network "private_network", ip: "192.168.33.10"
  # config.vm.network "public_network"
  # config.vm.synced_folder "../data", "/vagrant_data"
  #
  # config.vm.provider "virtualbox" do |vb|
  #   vb.gui = true
  #   vb.memory = "1024"
  # end
  #
  # config.vm.provision "shell", inline: <<-SHELL
  #   apt-get update
  #   apt-get install -y apache2
  # SHELL
end

As you can see the Vagrantfile has a set of instructions of how we want our VM to be. At this moment you will only see that the image is defined as ubuntu/trusty64.

Let’s start our VM:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ vagrant up

Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/trusty64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'ubuntu/trusty64' version '20190429.0.1' is up to date...
==> default: Setting the name of the VM: vagrant_default_1559238982328_97737
==> default: Clearing any previously set forwarded ports...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key
    default:
    default: Vagrant insecure key detected. Vagrant will automatically replace
    default: this with a newly generated keypair for better security.

Now that our VM has been booted, we can ssh to our server by simply running:

1
2
$ vagrant ssh
ubuntu-server $

Making changes to your config

So let’s say we want to edit our Vagrantfile to provide shell commands to install nginx and forward our host port 8080 to our guest port 80, so that we can access our VM’s webserver on localhost using port 8080.

Edit your Vagrantfile so that it looks like this:

1
2
3
4
5
6
7
8
Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.network "forwarded_port", guest: 80, host: 8080
  config.vm.provision "shell", inline: <<-SHELL
    apt-get update
    apt-get install nginx -y
  SHELL
end

In order to call the shell activity we need to call the provision argument:

1
$ vagrant provision

That will install nginx to our VM, then call reload to change to port configuration:

1
$ vagrant reload

Now that everything is in order, we can access our nginx web server:

1
2
3
4
$ curl -i http://localhost:8080
HTTP/1.1 200
Server: nginx
..

Tear down

Delete the server by running:

1
$ vagrant destroy

Install Blackbox Exporter to Monitor Websites With Prometheus

prometheus

Blackbox Exporter by Prometheus allows probing over endpoints such as http, https, icmp, tcp and dns.

What will we be doing

In this tutorial we will install the blackbox exporter on linux. Im assuming that you have already set up prometheus.

Install the Blackbox Exporter

First create the blackbox exporter user:

1
$ useradd --no-create-home --shell /bin/false blackbox_exporter

Download blackbox exporter and extract:

1
2
$ wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.14.0/blackbox_exporter-0.14.0.linux-amd64.tar.gz
$ tar -xvf blackbox_exporter-0.14.0.linux-amd64.tar.gz

Move the binaries in place and change the ownership:

1
2
$ cp blackbox_exporter-0.14.0.linux-amd64/blackbox_exporter /usr/local/bin/blackbox_exporter
$ chown blackbox_exporter:blackbox_exporter /usr/local/bin/blackbox_exporter

Remove the downloaded archive:

1
$ rm -rf blackbox_exporter-0.14.0.linux-amd64*

Create the blackbox directory and create the config:

1
2
$ mkdir /etc/blackbox_exporter
$ vim /etc/blackbox_exporter/blackbox.yml

Populate this config:

1
2
3
4
5
6
7
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_status_codes: []
      method: GET

Update the permissions of the config so that the user has ownership:

1
$ chown blackbox_exporter:blackbox_exporter /etc/blackbox_exporter/blackbox.yml

Create the systemd unit file:

1
$ vim /etc/systemd/system/blackbox_exporter.service

Populate the systemd unit file configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=blackbox_exporter
Group=blackbox_exporter
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml

[Install]
WantedBy=multi-user.target

Reload the systemd daemon and restart the service:

1
2
$ systemctl daemon-reload
$ systemctl start blackbox_exporter

The service should be started, verify:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ systemctl status blackbox_exporter
  blackbox_exporter.service - Blackbox Exporter
   Loaded: loaded (/etc/systemd/system/blackbox_exporter.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-05-08 00:02:40 UTC; 5s ago
 Main PID: 10084 (blackbox_export)
    Tasks: 6 (limit: 4704)
   CGroup: /system.slice/blackbox_exporter.service
           └─10084 /usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml

May 08 00:02:40 ip-172-31-41-126 systemd[1]: Started Blackbox Exporter.
May 08 00:02:40 ip-172-31-41-126 blackbox_exporter[10084]: level=info ts=2019-05-08T00:02:40.5229204Z caller=main.go:213 msg="Starting blackbox_exporter" version="(version=0.14.0, branch=HEAD, revision=bb
May 08 00:02:40 ip-172-31-41-126 blackbox_exporter[10084]: level=info ts=2019-05-08T00:02:40.52553523Z caller=main.go:226 msg="Loaded config file"
May 08 00:02:40 ip-172-31-41-126 blackbox_exporter[10084]: level=info ts=2019-05-08T00:02:40.525695324Z caller=main.go:330 msg="Listening on address" address=:9115

Enable the service on boot:

1
$ systemctl enable blackbox_exporter

Configure Prometheus

Next, we need to provide context to prometheus on what to monitor. We will inform prometheus to monitor a web endpoint on port 8080 using the blackbox exporter (we will create a python simplehttpserver to run on port 8080).

Edit the prometheus config /etc/prometheus/prometheus.yml and append the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - http://localhost:8080
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Open a new terminal, create a index.html:

1
$ echo "ok" > index.html

Then start a SimpleHTTPServer on port 8080:

1
$ python -m SimpleHTTPServer 8080

Head back to the previous terminal session and restart prometheus:

1
$ systemctl restart prometheus

Configure the Alarm definition:

Create a alarm definition that desribes that defines when to notify when a endpoint goes down:

1
$ vim /etc/prometheus/alert.rules.yml

And our alert definition:

1
2
3
4
5
6
7
8
9
10
groups:
- name: alert.rules
  rules:
  - alert: EndpointDown
    expr: probe_success == 0
    for: 10s
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint  down"

Ensure that the permission is set:

1
$ chown prometheus:prometheus /etc/prometheus/alert.rules.yml

Use the promtool to validate that the alert is correctly configured:

1
2
3
$ promtool check rules /etc/prometheus/alert.rules.yml
Checking /etc/prometheus/alert.rules.yml
  SUCCESS: 1 rules found

If everything is good, restart prometheus:

1
$ systemctl restart prometheus

Blackbox Exporter Dashboard

To install a blackbox exporter dashboard: https://grafana.com/dashboards/7587, create a new dashboard, select import, provide the ID: 7587, select the prometheus datasource and select save.

The dashboard should look similar to this:

blackbox-exporter

Next up, Alertmanager

In the next tutorial we will setup Alertmanager to alert when our endpoint goes down

Resources

See all #prometheus blogposts

Install Alertmanager to Alert Based on Metrics From Prometheus

prometheus

So we are pushing our time series metrics into prometheus, and now we would like to alarm based on certain metric dimensions. That’s where alertmanager fits in. We can setup targets and rules, once rules for our targets does not match, we can alarm to destinations suchs as slack, email etc.

What we will be doing:

In our previous tutorial we installed blackbox exporter to probe a endpoint. Now we will install Alertmanager and configure an alert to notify us via email and slack when our endpoint goes down. See this post if you have not seen the previous tutorial.

Install Alertmanager

Create the user for alertmanager:

1
$ useradd --no-create-home --shell /bin/false alertmanager

Download alertmanager and extract:

1
2
$ https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz
$ tar -xvf alertmanager-0.17.0.linux-amd64.tar.gz

Move alertmanager and amtool birnaries in place:

1
2
$ cp alertmanager-0.17.0.linux-amd64/alertmanager /usr/local/bin/
$ cp alertmanager-0.17.0.linux-amd64/amtool /usr/local/bin/

Ensure that the correct permissions are in place:

1
2
$ chown alertmanager:alertmanager /usr/local/bin/alertmanager
$ chown alertmanager:alertmanager /usr/local/bin/amtool

Cleanup:

1
$ rm -rf alertmanager-0.17.0*

Configure Alertmanager:

Create the alertmanager directory and configure the global alertmanager configuration:

1
2
$ mkdir /etc/alertmanager
$ vim /etc/alertmanager/alertmanager.yml

Provide the global config and ensure to populate your personal information. See this post to create a slack webhook.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
global:
  smtp_smarthost: 'smtp.domain.net:587'
  smtp_from: 'AlertManager <mailer@domain.com>'
  smtp_require_tls: true
  smtp_hello: 'alertmanager'
  smtp_auth_username: 'username'
  smtp_auth_password: 'password'

  slack_api_url: 'https://hooks.slack.com/services/x/xx/xxx'

route:
  group_by: ['instance', 'alert']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: team-1

receivers:
  - name: 'team-1'
    email_configs:
      - to: 'user@domain.com'
    slack_configs:
      # https://prometheus.io/docs/alerting/configuration/#slack_config
      - channel: 'system_events'
      - username: 'AlertManager'
      - icon_emoji: ':joy:'

Ensure the permissions are in place:

1
$ chown alertmanager:alertmanager -R /etc/alertmanager

Create the alertmanager systemd unit file:

1
$ vim /etc/systemd/system/alertmanager.service

And supply the unit file configuration. Note that I am exposing port 9093 directly as Im not using a reverse proxy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/etc/alertmanager/
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --web.external-url http://0.0.0.0:9093

[Install]
WantedBy=multi-user.target

Now we need to inform prometheus that we will send alerts to alertmanager to it’s exposed port:

1
$ vim /etc/prometheus/prometheus.yml

And supply the alertmanager configuration for prometheus:

1
2
3
4
5
6
7
...
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093
...

So when we get alerted, our alert will include a link to our alert. We need to provide the base url of that alert. That get’s done in our alertmanager systemd unit file: /etc/systemd/system/alertmanager.service under --web.external-url passing the alertmanager base ip address:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/etc/alertmanager/
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --web.external-url http://<your.alertmanager.ip.address>:9093

[Install]
WantedBy=multi-user.target

Then we need to do the same with the prometheus systemd unit file: /etc/systemd/system/prometheus.service under --web.external-url passing the prometheus base ip address:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.external-url http://<your.prometheus.ip.address>

[Install]
WantedBy=multi-user.target

Since we have edited the systemd unit files, we need to reload the systemd daemon:

1
$ systemctl daemon-reload

Then restart prometheus and alertmanager:

1
2
$ systemctl restart prometheus
$ systemctl restart alertmanager

Inspect the status of alertmanager and prometheus:

1
2
$ systemctl status alertmanager
$ systemctl status prometheus

If everything seems good, enable alertmanager on boot:

1
$ systemctl enable alertmanager

Access Alertmanager:

Access alertmanager on your endpoint on port 9093:

alertmanager

From our previous tutorial we started a local web service on port 8080 that is being monitored by prometheus. Let’s stop that service to test out the alerting. You should get a notification via email:

alertmanager

And the notification via slack:

alertmanager

When you start the service again and head over to the prometheus ui under alerts, you will see that the service recovered:

prometheus

Install Prometheus Alertmanager Plugin

Install the Prometheus Alertmanager Plugin in Grafana. Head to the instance where grafana is installed and install the plugin:

1
$ grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource

Once the plugin is installed, restart grafana:

1
$ service grafana-server restart

Install the dasboard grafana.com/dashboards/8010. Create a new datasource, select the prometheus-alertmanager datasource, configure and save.

Add a new dasboard, select import and provide the ID 8010, select the prometheus-alertmanager datasource and save. You should see the following (more or less):

prometheus-alertmanager

Resources

See all #prometheus blogposts

Install Grafana to Visualize Your Metrics From Datasources Such as Prometheus on Linux

image

Grafana is a Open Source Dashboarding service that allows you to monitor, analyze and graph metrics from datasources such as prometheus, influxdb, elasticsearch, aws cloudwatch, and many more.

Not only is grafana amazing, its super pretty!

Example of how a dashboard might look like:

E24B39B1-23C8-44C5-959D-6E6275F8FE99

What are we doing today

In this tutorial we will setup grafana on linux. If you have not set up prometheus, follow this blogpost to install prometheus.

Install Grafana

I will be demonstrating how to install grafana on debian, if you have another operating system, head over to grafana documentation for other supported operating systems.

Get the gpg key:

1
$ curl https://packages.grafana.com/gpg.key | sudo apt-key add -

Import the public keys:

1
$ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys  8C8C34C524098CB6 

Add the latest stable packages to your repository:

1
$ add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"

Install a pre-requirement package:

1
$ apt install apt-transport-https -y

Update the repository index and install grafana:

1
$ apt update && sudo apt install grafana -y

Once grafana is installed, start the service:

1
$ service grafana-server start

Then enable the service on boot:

1
$ update-rc.d grafana-server defaults

If you want to control the service via systemd:

1
2
3
$ systemctl daemon-reload
$ systemctl start grafana-server
$ systemctl status grafana-server

Optional: Nginx Reverse Proxy

If you want to front your grafana instance with a nginx reverse proxy:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cat /etc/nginx/sites-enabled/grafana
server {
    listen 80;
    server_name grafana.domain.com;

    location / {
        proxy_pass http://127.0.0.1:3000/;
        proxy_redirect http://127.0.0.1:3000/ /;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP $remote_addr;
    }

Then restart nginx:

1
$ systemctl restart nginx

Access Grafana

If you are accessing grafana directly, access grafana on http://your-grafana-ip:3000/ and your username is admin and password admin

Dashboarding Tutorials

Have a look at this screencast where the guys from grafana show you how to build dashboards:

Also have a look at their public repository of dashboards

For more tutorials on prometheus and metrics have a look at #prometheus

Install Pushgateway to Expose Metrics to Prometheus

In most cases when we want to scrape a node for metrics, we will install node-exporter on a host and configure prometheus to scrape the configured node to consume metric data. But in certain cases we want to push custom metrics to prometheus. In such cases, we can make use of pushgateway.

Pushgateway allows you to push custom metrics to push gateway’s endpoint, then we configure prometheus to scrape push gateway to consume the exposed metrics into prometheus.

Pre-Requirements

If you have not set up Prometheus, head over to this blogpost to set up prometheus on Linux.

What we will do?

In this tutorial, we will setup pushgateway on linux and after pushgateway has been setup, we will push some custom metrics to pushgateway and configure prometheus to scrape metrics from pushgateway.

Install Pushgateway

Get the latest version of pushgateway from prometheus.io, then download and extract:

1
2
$ wget https://github.com/prometheus/pushgateway/releases/download/v0.8.0/pushgateway-0.8.0.linux-amd64.tar.gz
$ tar -xvf pushgateway-0.8.0.linux-amd64.tar.gz

Create the pushgateway user:

1
$ useradd --no-create-home --shell /bin/false pushgateway

Move the binary in place and update the permissions to the user that we created:

1
2
$ cp pushgateway-0.8.0.linux-amd64/pushgateway /usr/local/bin/pushgateway
$ chown pushgateway:pushgateway /usr/local/bin/pushgateway

Create the systemd unit file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cat > /etc/systemd/system/pushgateway.service << EOF
[Unit]
Description=Pushgateway
Wants=network-online.target
After=network-online.target

[Service]
User=pushgateway
Group=pushgateway
Type=simple
ExecStart=/usr/local/bin/pushgateway \
    --web.listen-address=":9091" \
    --web.telemetry-path="/metrics" \
    --persistence.file="/tmp/metric.store" \
    --persistence.interval=5m \
    --log.level="info" \
    --log.format="logger:stdout?json=true"

[Install]
WantedBy=multi-user.target
EOF

Reload systemd and restart the pushgateway service:

1
2
$ systemctl daemon-reload
$ systemctl restart pushgateway

Ensure that pushgateway has been started:

1
2
3
4
5
6
7
8
9
10
$ systemctl status pushgateway
  pushgateway.service - Pushgateway
   Loaded: loaded (/etc/systemd/system/pushgateway.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-05-07 09:05:57 UTC; 2min 33s ago
 Main PID: 6974 (pushgateway)
    Tasks: 6 (limit: 4704)
   CGroup: /system.slice/pushgateway.service
           └─6974 /usr/local/bin/pushgateway --web.listen-address=:9091 --web.telemetry-path=/metrics --persistence.file=/tmp/metric.store --persistence.interval=5m --log.level=info --log.format=logger:st

May 07 09:05:57 ip-172-31-41-126 systemd[1]: Started Pushgateway.

Configure Prometheus

Now we want to configure prometheus to scrape pushgateway for metrics, then the scraped metrics will be injected into prometheus’s time series database:

At the moment, I have prometheus, node-exporter and pushgateway on the same node so I will provide my complete prometheus configuration, If you are just looking for the pushgateway config, it will be the last line:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ cat /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['localhost:9091']

Restart prometheus:

1
$ systemctl restart prometheus

Push metrics to pushgateway

First we will look at a bash example to push metrics to pushgateway:

1
$ echo "cpu_utilization 20.25" | curl --data-binary @- http://localhost:9091/metrics/job/my_custom_metrics/instance/10.20.0.1:9000/provider/hetzner

Have a look at pushgateway’s metrics endpoint:

1
2
3
$ curl -L http://localhost:9091/metrics/
# TYPE cpu_utilization untyped
cpu_utlization{instance="10.20.0.1:9000",job="my_custom_metrics",provider="hetzner"} 20.25

Let’s look at a python example on how we can push metrics to pushgateway:

1
2
3
4
5
6
7
8
9
10
import requests

job_name='my_custom_metrics'
instance_name='10.20.0.1:9000'
provider='hetzner'
payload_key='cpu_utilization'
payload_value='21.90'

response = requests.post('http://localhost:9091/metrics/job/{j}/instance/{i}/team/{t}'.format(j=job_name, i=instance_name, t=team_name), data='{k} {v}\n'.format(k=payload_key, v=payload_value))
print(response.status_code)

With this method, you can push any custom metrics (bash, lambda function, etc) to pushgateway and allow prometheus to consume that data into it’s time series database.

Resources:

See #prometheus for more posts on Prometheus

Running a HA MySQL Galera Cluster on Docker Swarm

image

In this post we will setup a highly available mysql galera cluster on docker swarm.

About

The service is based of docker-mariadb-cluster repository and it’s designed not to have any persistent data attached to the service, but rely on the “nodes” to replicate the data.

Note, that however this proof of concept works, I always recommend to use a remote mysql database outside your cluster, such as RDS etc.

Since we don’t persist any data on the mysql cluster, I have associated a dbclient service that will run continious backups, which we will persist the path where the backups reside to disk.

Deploy the MySQL Cluster

The docker-compose.yml that we will use looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
version: '3.5'
services:
  dbclient:
    image: alpine
    environment:
      - BACKUP_ENABLED=1
      - BACKUP_INTERVAL=3600
      - BACKUP_PATH=/data
      - BACKUP_FILENAME=db_backup
    networks:
      - dbnet
    entrypoint: |
      sh -c 'sh -s << EOF
      apk add --no-cache mysql-client
      while true
        do
          if [ $$BACKUP_ENABLED == 1 ]
            then
              sleep $$BACKUP_INTERVAL
              mkdir -p $$BACKUP_PATH/$$(date +%F)
              echo "$$(date +%FT%H.%m) - Making Backup to : $$BACKUP_PATH/$$(date +%F)/$$BACKUP_FILENAME-$$(date +%FT%H.%m).sql.gz"
              mysqldump -u root -ppassword -h dblb --all-databases | gzip > $$BACKUP_PATH/$$(date +%F)/$$BACKUP_FILENAME-$$(date +%FT%H.%m).sql.gz
              find $$BACKUP_PATH -mtime 7 -delete
          fi
        done
      EOF'
    volumes:
      - vol_dbclient:/data
    deploy:
      mode: replicated
      replicas: 1

  dbcluster:
    image: toughiq/mariadb-cluster
    networks:
      - dbnet
    environment:
      - DB_SERVICE_NAME=dbcluster
      - MYSQL_ROOT_PASSWORD=password
      - MYSQL_DATABASE=mydb
      - MYSQL_USER=mydbuser
      - MYSQL_PASSWORD=mydbpass
    deploy:
      mode: replicated
      replicas: 1

  dblb:
    image: toughiq/maxscale
    networks:
      - dbnet
    ports:
      - 3306:3306
    environment:
      - DB_SERVICE_NAME=dbcluster
      - ENABLE_ROOT_USER=1
    deploy:
      mode: replicated
      replicas: 1

volumes:
  vol_dbclient:
    driver: local

networks:
  dbnet:
    name: dbnet
    driver: overlay

The dbclient is configured to be in the same network as the cluster so it can reach the mysql service. The default behavior is that it will make a backup every hour (3600 seconds) to the /data/{date}/ path.

Deploy the stack:

1
2
3
4
5
$ docker stack deploy -c docker-compose.yml galera
Creating network dbnet
Creating service galera_dbcluster
Creating service galera_dblb
Creating service galera_dbclient

Have a look to see if all the services is running:

1
2
3
4
5
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
jm7p70qre72u        galera_dbclient     replicated          1/1                 alpine:latest
p8kcr5y7szte        galera_dbcluster    replicated          1/1                 toughiq/mariadb-cluster:latest
1hu3oxhujgfm        galera_dblb         replicated          1/1                 toughiq/maxscale:latest          :3306->3306/tcp

The Backup Client

As mentioned the backup client backs up to the /data/ path:

1
2
3
4
$ docker exec -it $(docker ps -f name=galera_dbclient -q) find /data/
/data/
/data/2019-05-10
/data/2019-05-10/db_backup-2019-05-10T10.05.sql.gz

Let’s go ahead and populate some data into our mysql database:

1
2
3
4
$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb
MySQL [(none)]> create table mydb.foo (name varchar(10));
MySQL [(none)]> insert into mydb.foo values('ruan');
MySQL [(none)]> exit

Scale the Cluster

At the moment we only have 1 replica for our mysql cluster, let’s go ahead and scale the cluster to 3 replicas:

1
2
3
4
5
6
7
$ docker service scale galera_dbcluster=3
galera_dbcluster scaled to 3
overall progress: 3 out of 3 tasks
1/3: running   [==================================================>]
2/3: running   [==================================================>]
3/3: running   [==================================================>]
verify: Service converged

Verify that the service has been scaled:

1
2
3
4
5
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
jm7p70qre72u        galera_dbclient     replicated          1/1                 alpine:latest
p8kcr5y7szte        galera_dbcluster    replicated          3/3                 toughiq/mariadb-cluster:latest
1hu3oxhujgfm        galera_dblb         replicated          1/1                 toughiq/maxscale:latest          :3306->3306/tcp

Test, by reading from the database:

1
2
3
4
5
6
$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

Simulate a Node Failure:

Simulate a node failure by killing one of the mysql containers:

1
$ docker kill 9e336032ab52

Verify that one container is missing from our service:

1
2
3
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
p8kcr5y7szte        galera_dbcluster    replicated          2/3                 toughiq/mariadb-cluster:latest

While the container is provisioning, as we have 2 out of 3 running containers, read the data 3 times so test that the round robin queries dont hit the affected container (the dblb wont route traffic to the affected container):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

$ docker exec -it $(docker ps -f name=galera_dbclient -q) mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

Verify that the 3rd container has checked in:

1
2
3
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                            PORTS
p8kcr5y7szte        galera_dbcluster    replicated          3/3                 toughiq/mariadb-cluster:latest

How to Restore?

I’m deleting the database to simulate the scenario where we need to restore:

1
2
$ docker exec -it $(docker ps -f name=galera_dbclient -q) sh
> mysql -uroot -ppassword -h dblb -e'drop database mydb;'

Ensure the db is not present:

1
2
> mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
ERROR 1146 (42S02) at line 1: Table 'mydb.foo' doesn't exist

Find the archive and extract:

1
2
3
4
5
6
> find /data/
/data/
/data/2019-05-10
/data/2019-05-10/db_backup-2019-05-10T10.05.sql.gz

> gunzip /data/2019-05-10/db_backup-2019-05-10T10.05.sql.gz

Restore the backed up database to MySQL:

1
> mysql -uroot -ppassword -h dblb < /data/2019-05-10/db_backup-2019-05-10T10.05.sql

Test that we can read our data:

1
2
3
4
5
6
> mysql -uroot -ppassword -h dblb -e'select * from mydb.foo;'
+------+
| name |
+------+
| ruan |
+------+

Create Secrets With Vaults Transits Secret Engine

Vault’s transit secrets engine handles cryptographic functions on data-in-transit. Vault doesn’t store the data sent to the secrets engine, so it can also be viewed as encryption as a service.

In this tutorial we will demonstrate how to use Vault’s Transit Secret Engine.

Related Posts:

Enable the Transit Engine:

Enable transit secret engine using the /sys/mounts endpoint:

1
$ curl --header "X-Vault-Token: $VAULT_TOKEN" -XPOST -d '{"type": "transit", "description": "encs encryption"}' http://127.0.0.1:8200/v1/sys/mounts/transit

Create the Key Ring:

Create an encryption key ring named fookey using the transit/keys endpoint:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ curl -s --header "X-Vault-Token: $VAULT_TOKEN" -XGET http://127.0.0.1:8200/v1/transit/keys/fookey | jq
{
  "request_id": "8375227a-4a9f-a108-0b89-84c448419e80",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "allow_plaintext_backup": false,
    "deletion_allowed": false,
    "derived": false,
    "exportable": false,
    "keys": {
      "1": 1554654295
    },
    "latest_version": 1,
    "min_available_version": 0,
    "min_decryption_version": 1,
    "min_encryption_version": 0,
    "name": "fookey",
    "supports_decryption": true,
    "supports_derivation": true,
    "supports_encryption": true,
    "supports_signing": false,
    "type": "aes256-gcm96"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Encoding

Encode your string:

1
2
$ base64 <<< "hello world"
aGVsbG8gd29ybGQK

Encrypt

To encrypt your secret, use the transit/encrypt endpoint:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl -s --header "X-Vault-Token: $VAULT_TOKEN" --request POST  --data '{"plaintext": "aGVsbG8gd29ybGQK"}' http://127.0.0.1:8200/v1/transit/encrypt/fookey | jq
{
  "request_id": "ab00ba0f-9e45-0aca-e3c1-7765fd83fc3c",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "ciphertext": "vault:v1:Yo4U6xXFM2FoBOaUrw0w3EpSlJS6gmsa4HP1xKtjrk0+xSqi5Rvjvg=="
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Decrypt:

Use the transit/decrypt endpoint to decrypt the ciphertext:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl -s --header "X-Vault-Token: $VAULT_TOKEN" --request POST  --data '{"ciphertext": "vault:v1:Yo4U6xXFM2FoBOaUrw0w3EpSlJS6gmsa4HP1xKtjrk0+xSqi5Rvjvg=="}' http://127.0.0.1:8200/v1/transit/decrypt/fookey | jq
{
  "request_id": "3d9743a0-2daf-823c-f413-8c8a90753479",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "plaintext": "aGVsbG8gd29ybGQK"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Decoding

Decode the response:

1
2
$ base64 --decode <<< "aGVsbG8gd29ybGQK"
hello world

Resources