Recently Grafana Labs announced Loki v2 and its awesome! Definitely check out their blog post on more details.
Loki has a index option called boltdb-shipper, which allows you to run Loki with only a object store and you no longer need a dedicated index store such as DynamoDB. You can extract labels from log lines at query time, which is CRAZY! And I really like how they’ve implemented it, you can parse, filter and format like mad. I really like that.
What will we be doing today
In this tutorial we will setup a alert using the Loki local ruler to alert us when we have high number of log events coming in. For example, let’s say someone has debug logging enabled in their application and we want to send a alert to slack when it breaches the threshold.
I will simulate this with a
http-client container which runs
curl in a while loop to fire a bunch of http requests against the nginx container which logs to Loki, so we can see how the alerting works, and in this scenario we will alert to Slack.
And after that we will stop our http-client container to see how the alarm resolves when the log rate comes down again.
All the components are available in the
docker-compose.yml on my github repository
Let’s break it down and start with the loki config:
1 2 3 4 5 6 7 8 9 10 11 12 13
In the section of the loki config, I will be making use of the local ruler and map my alert rules under
/etc/loki/rules/ and we are also defining our alertmanager instance where these alerts should be shipped to.
In my rule definition
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
In my expression, I am using LogQL to return per second rate of all my docker logs within the last minute per compose service for my dockerlogs job and we are specifying that it should alert when the threshold is above 60.
As you can see I have a couple of labels and annotations, which becomes very useful when you have dashboard links, runbooks etc and you would like to map that to your alert. I am doing the mapping in my
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
As you can see, when my alert matches nothing it will go to my catchall receiver, but when my label contains
devops and the route the alert to my
warning-devops-slack receiver, and then we will be parsing our labels and annotations to include the values in our alarm on slack.
Enough with the background details, and it’s time to get into the action.
All the code for this demonstration will be available in my github repository: github.com/ruanbekker/loki-alerts-docker
The docker-compose will have a container of grafana, alertmanager, loki, nginx and a http-client.
The http-client is curl in a while loop that will just make a bunch of http requests against the nginx container, which will be logging to loki.
Get the source
Get the code from my github repository:
You will need to replace the slack webhook url and the slack channel where you want your alerts to be sent to. This will take the environment variables and replace the values in
config/alertmanager.yml (always check out the script first, before executing it)
You can double check by running
cat config/alertmanager.yml, once you are done, boot the stack:
Open up grafana:
Use the initial user and password combination
admin/admin and then reset your password:
Browse for your labels on the log explorer section, in my example it will be
When we select our job=“dockerlogs” label, we will see our logs:
As I explained earlier the query that we will be running in our ruler, can be checked what the rate currently is:
Which will look like this:
In the configured expression in our ruler config, we have set to alarm once the value goes above 60, we can validate this by running:
And we can verify that this is the case, and by now it should be alarming:
Head over to alertmanager:
We can see alertmanager is showing the alarm:
When we head over to slack, we can see our notification:
So let’s stop our http client:
Then we can see the logging stopped:
And in slack, we should see that the alarm recovered and we should see the notification:
Then you can terminate your stack:
Pretty epic stuff right? I really love how cost effective Loki is as logging use to be so expensive to run and especially maintain, Grafana Labs are really doing some epic work and my hat goes off to them.