Ruan Bekker's Blog

From a Curious mind to Posts on Github

How to Setup Alerting With Loki

image

Recently Grafana Labs announced Loki v2 and its awesome! Definitely check out their blog post on more details.

Loki has a index option called boltdb-shipper, which allows you to run Loki with only a object store and you no longer need a dedicated index store such as DynamoDB. You can extract labels from log lines at query time, which is CRAZY! And I really like how they’ve implemented it, you can parse, filter and format like mad. I really like that.

And then generating alerts from any query, which we will go into today. Definitely check out this blogpost and this video for more details on the features of Loki v2.

What will we be doing today

In this tutorial we will setup a alert using the Loki local ruler to alert us when we have high number of log events coming in. For example, let’s say someone has debug logging enabled in their application and we want to send a alert to slack when it breaches the threshold.

I will simulate this with a http-client container which runs curl in a while loop to fire a bunch of http requests against the nginx container which logs to Loki, so we can see how the alerting works, and in this scenario we will alert to Slack.

And after that we will stop our http-client container to see how the alarm resolves when the log rate comes down again.

All the components are available in the docker-compose.yml on my github repository

Components

Let’s break it down and start with the loki config:

1
2
3
4
5
6
7
8
9
10
11
12
13
...
ruler:
  storage:
    type: local
    local:
      directory: /etc/loki/rules
  rule_path: /tmp/loki/rules-temp
  alertmanager_url: http://alertmanager:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true
  enable_alertmanager_v2: true

In the section of the loki config, I will be making use of the local ruler and map my alert rules under /etc/loki/rules/ and we are also defining our alertmanager instance where these alerts should be shipped to.

In my rule definition /etc/loki/rules/demo/rules.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
groups:
  - name: rate-alerting
    rules:
      - alert: HighLogRate
        expr: |
          sum by (compose_service)
            (rate({job="dockerlogs"}[1m]))
            > 60
        for: 1m
        labels:
            severity: warning
            team: devops
            category: logs
        annotations:
            title: "High LogRate Alert"
            description: "something is logging a lot"
            impact: "impact"
            action: "action"
            dashboard: "https://grafana.com/service-dashboard"
            runbook: "https://wiki.com"
            logurl: "https://grafana.com/log-explorer"

In my expression, I am using LogQL to return per second rate of all my docker logs within the last minute per compose service for my dockerlogs job and we are specifying that it should alert when the threshold is above 60.

As you can see I have a couple of labels and annotations, which becomes very useful when you have dashboard links, runbooks etc and you would like to map that to your alert. I am doing the mapping in my alertmanager.yml config:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
route:
...
  receiver: 'default-catchall-slack'
  routes:
  - match:
      severity: warning
    receiver: warning-devops-slack
    routes:
    - match_re:
        team: .*(devops).*
      receiver: warning-devops-slack

receivers:
...
- name: 'warning-devops-slack'
  slack_configs:
    - send_resolved: true
      channel: '__SLACK_CHANNEL__'
      title: ':fire::white_check_mark: []  '
      text: >-
        
          *Description:* 
          *Severity:* ``
          *Graph:* <|:chart_with_upwards_trend:><|:chart_with_upwards_trend:> *Dashboard:* <|:bar_chart:> *Runbook:* <|:spiral_note_pad:>
          *Details:*
           - *:* ``
          
           - *Impact*: 
           - *Receiver*: warning--slack
        

As you can see, when my alert matches nothing it will go to my catchall receiver, but when my label contains devops and the route the alert to my warning-devops-slack receiver, and then we will be parsing our labels and annotations to include the values in our alarm on slack.

Demo

Enough with the background details, and it’s time to get into the action.

All the code for this demonstration will be available in my github repository: github.com/ruanbekker/loki-alerts-docker

The docker-compose will have a container of grafana, alertmanager, loki, nginx and a http-client.

The http-client is curl in a while loop that will just make a bunch of http requests against the nginx container, which will be logging to loki.

Get the source

Get the code from my github repository:

1
2
$ git clone https://github.com/ruanbekker/loki-alerts-docker
$ cd loki-alerts-docker

You will need to replace the slack webhook url and the slack channel where you want your alerts to be sent to. This will take the environment variables and replace the values in config/alertmanager.yml (always check out the script first, before executing it)

1
$ SLACK_WEBHOOK_URL="https://hooks.slack.com/services/xx/xx/xx" SLACK_CHANNEL="#notifications" ./parse_configs.sh

You can double check by running cat config/alertmanager.yml, once you are done, boot the stack:

1
$ docker-compose up -d

Open up grafana:

1
$ open http://grafana.localdns.xyz:3000

Use the initial user and password combination admin/admin and then reset your password:

image

Browse for your labels on the log explorer section, in my example it will be {job="dockerlogs"}:

image

When we select our job=“dockerlogs” label, we will see our logs:

image

As I explained earlier the query that we will be running in our ruler, can be checked what the rate currently is:

1
sum by (compose_project, compose_service) (rate({job="dockerlogs"}[1m]))

Which will look like this:

image

In the configured expression in our ruler config, we have set to alarm once the value goes above 60, we can validate this by running:

1
sum by (compose_project, compose_service) (rate({job="dockerlogs"}[1m])) > 60

And we can verify that this is the case, and by now it should be alarming:

image

Head over to alertmanager:

1
$ open http://alertmanager.localdns.xyz:9093

We can see alertmanager is showing the alarm:

image

When we head over to slack, we can see our notification:

image

So let’s stop our http client:

1
2
$ docker-compose stop http-client
Stopping http-client ... done

Then we can see the logging stopped:

image

And in slack, we should see that the alarm recovered and we should see the notification:

image

Then you can terminate your stack:

1
$ docker-compose down

Pretty epic stuff right? I really love how cost effective Loki is as logging use to be so expensive to run and especially maintain, Grafana Labs are really doing some epic work and my hat goes off to them.

Thanks

I hope you found this useful, feel free to reach out to me on Twitter @ruanbekker or visit me on my website ruan.dev

-->

Comments