Personal utility (actually just a command) that I use to reach my Raspberry Pi Nodes that has no direct route via the Internet
Other Projects
There’s a lot of other tools out there that’s already solving this issue, such as inlets, but I wanted my own, so that I can extend features to it as it pleases me.
Overview
This is more ore less how it looks like:
1234567
[VPS] <-- Has a Public IP
|
|
[HOME NETWORK] <-- Dynamic IP
|
|
[rpi-01:22], [rpi-02:22] <-- Private IPs
SSH Tunnel is setup from the Raspberry Pi Nodes
Each Raspberry Pi sets up a unique port on the VPS for the tunnel to traverse to the Rpi on port 22
To reach Rpi-01, you hop onto the VPS and ssh to localhost port 2201
To reach Rpi-02, you hop onto the VPS and ssh to localhost port 2202, etc
Progress
The tool will still be built, but using ssh it’s quite easy
I am no DBA, but I got curious when I noticed sluggish write performance on a mysql database, and I remembered somewhere that you should always use batch writes over sequential writes. So I decided to test it out, using a python script and a mysql server.
What will we be doing
I wrote a python script that writes 100,000 records to a database and keeps time of how long the writes took, 2 examples which I will compare:
One script writing each record to the database
One script writing all the records as batch
Sequential Writes
It took 48 seconds to write 100,000 records into a database using sequential writes:
In this post we will build a nginx reverse proxy with caching enabled for our static content such as images, which will be our frontend and therefore we will have port 80 exposed, and run our ghost blog as our backend, which we will proxy traffic through from our nginx container.
But why would you want caching?
Returning data from memory is a lot faster than returning data from disk, and in this case where a request is being made against nginx, then it proxy passes the request to ghost, gets the data that you requested and returns the data to the client.
So for items that rarely changes like images, we can benefit from caching, so the images can be returned from the nginx service, where the first request will be made to ghost and then it will be loaded into nginx cache, so then the next time when you request the same image it will be returned from cache instead of making that same request to ghost again.
Caching Info
For this demonstration once we define the size of our chache which will be 500MB and we specify that if an object has not been accessed for 24 hours, we can expire the object from the cache.
Nginx
We will build our nginx container by adding our custom nginx config to our dockerfile.
Our Dockerfile will look like the following:
12
ROM nginx:stable
ADD nginx.conf /etc/nginx/nginx.conf
Once your containers are in a running state, open your browsers devloper tools and look at the networking tab, then access your ghost blog on http://localhost:80/, the first time a image is opened you should see the cache shows MISS when you refresh again you should see a HIT, which means that the object is being returned from your cache.
Links that I stumble upon, I always save to getpocket.com and tag them with the relevant info. So the one day I had this random idea to list my links per category on a web service and I was wondering how to approach that scenario, which lead me to this.
In this post we will consume all our saved bookmarks from pocket.com and ingest them into elasticsearch. But we dont want to read all the items from pocket’s api every single time when the consumer run, therefore I have a method of checkpointing the last save run with a timestamp, so the next time it runs, we have context where to start from
What will we be doing
We will authenticate with pocket, then write the code how we will read the data from pocket and ingest them into elasticsearch.
Authentication
Head over to the developer console on pocket and create a new application then save your config in config.py which we will have as:
import config
import requests
import webbrowser
import time
CONSUMER_KEY = config.consumer_key
BASE_URL = "https://getpocket.com"
REDIRECT_URL = "localhost" # <-- you can run python -m SimpleHTTPServer 80 to have a local server listening on port 80
HEADERS = {"Content-Type": "application/json; charset=UTF-8", "X-Accept": "application/json"}
def request_code():
payload = {
"consumer_key": CONSUMER_KEY,
"redirect_uri": REDIRECT_URL,
}
response = requests.post("https://getpocket.com/v3/oauth/request", headers=HEADERS, json=payload)
print("request_code")
print(response.json())
return response.json()["code"]
def request_access_token(code):
payload = {
"consumer_key": CONSUMER_KEY,
"code": code,
}
response = requests.post("https://getpocket.com/v3/oauth/authorize", headers=HEADERS, json=payload)
print("request_access_token")
print(response.json())
time.sleep(10)
return response.json()["access_token"]
def request_authorization(code):
url = "https://getpocket.com/auth/authorize?request_token={code}&redirect_uri={redirect_url}".format(code=code, redirect_url=REDIRECT_URL)
print("request_authorization")
print(url)
webbrowser.open(url, new=2)
def authenticate_pocket():
code = request_code()
request_authorization(code)
return request_access_token(code)
authenticate_pocket()
# access_token will be returned
Main App
Once we have our access_token we can save that to our config.py, we will also be working with elasticsearch so we can add our elasticsearch info there as well:
So what we are doing here is that we are reading from the pocket api all the data that you saved in your account, and save the current time in epoch format, which we will need to tell our run when was the last time we consumed and keep that value in memory.
Then from the data we received, we will map the data that we are interested in, into key/value pairs and then ingest the data into elasticsearch.
After the initial ingestion has been done, which can take some time depending on how many items you have on pocket, as soon as it’s done it will write the checkpoint time to elasticsearch so that the client know the next time from what time to search from again.
This way we dont ingest all the items again, testing it:
123456789
$ python server.py
getting checkpoint id
got checkpoint id: 1591045652
fetch items from pocket
ingesting pocket items into es
got 2 items from pocket
Number of items left to ingest: 2
Number of items left to ingest: 1
writing checkpoint to es: 1591392580
Add one more item to pocket, then run our ingester again:
12345678
$ python server.py
getting checkpoint id
got checkpoint id: 1591392580
fetch items from pocket
ingesting pocket items into es
got 1 items from pocket
Number of items left to ingest: 1
writing checkpoint to es: 1591650259
Now that our data is in elasticsearch, we can build a search engine or a web application that can list our favorite links per category. I wil write up a post on the search engine in the future.
Thank You
If you liked this please send me a shout out on Twitter: @ruanbekker
This is a getting started on python-rq tutorial and I will demonstrate how to work with asynchronous tasks using python redis queue (python-rq).
What will we be doing
We want a client to submit 1000’s of jobs in a non-blocking asynchronous fashion, and then we will have workers which will consume these jobs from our redis queue, and process those tasks at the rate of what our consumer can handle.
The nice thing about this is that, if our consumer is unavailable for processing the tasks will remain in the queue and once the consumer is ready to consume, the tasks will be executed. It’s also nice that its asynchronous, so the client don’t have to wait until the task has finished.
We will run a redis server using docker, which will be used to queue all our jobs, then we will go through the basics in python and python-rq such as:
Writing a Task
Enqueueing a Job
Getting information from our queue, listing jobs, job statuses
Running our workers to consume from the queue and action our tasks
Basic application which queues jobs to the queue, consumes and action them and monitors the queue
Redis Server
You will require docker for this next step, to start the redis server:
1
$ docker run --rm -itd --name redis -p 6379:6379 redis:alpine
Python RQ
Install python-rq:
1
$ pip install rq
Create the task which will be actioned by our workers, in our case it will just be a simple function that adds all the numbers from a given string to a list, then adds them up and return the total value.
This is however a very basic task, but its just for demonstration.
Our tasks.py:
12345678910
def sum_numbers_from_string(string):
numbers = []
for each_character in string:
if each_character.isdigit():
numbers.append(int(each_character))
total = 0
for each_number in numbers:
total=total+each_number
return total
To test this locally:
123
>>> from tasks import sum_numbers_from_string
>>> sum_numbers_from_string('adje-fje5-sjfdu1s-gdj9-asd1fg')
16
Now, lets import redis and redis-queue, with our tasks and instantiate a queue object:
12345
>>> from redis import Redis
>>> from rq import Connection, Queue, Worker
>>> from tasks import sum_numbers_from_string
>>> redis_connection = Redis(host='localhost', port=6379, db=0)
>>> q = Queue(connection=redis_connection)
Submit a Task to the Queue
Let’s submit a task to the queue:
1
>>> result = q.enqueue(sum_numbers_from_string, 'hbj2-plg5-2xf4r1s-f2lf-9sx4ff')
We have a couple of properties from result which we can inspect, first let’s have a look at the id that we got back when we submitted our task to the queue:
Now that our task is queued, let’s fire of our worker to consume the job from the queue and action the task:
12345678910
>>> w = Worker([q], connection=redis_connection)
>>> w.work()
14:05:35 Worker rq:worker:49658973741d4085961e34e9641227dd: started, version 1.4.1
14:05:35 Listening on default...
14:05:35 Cleaning registries for queue: default
14:05:35 default: tasks.sum_numbers_from_string('hbj2-plg5-2xf4r1s-f2lf-9sx4ff') (5a607474-cf1b-4fa5-9adb-f8437555a7e7)
14:05:40 default: Job OK (5a607474-cf1b-4fa5-9adb-f8437555a7e7)
14:05:40 Result is kept for 500 seconds
14:05:59 Warm shut down requested
True
Now, when we get the status of our job, you will see that it finished:
12
>>> result.get_status()
'finished'
And to get the result from our worker:
12
>>> result.result
29
And like before, if you dont have context of your job id, you can get the job id, then return the result:
123
>>> result = fetched_job = q.fetch_job('5a607474-cf1b-4fa5-9adb-f8437555a7e7')
>>> result.result
29
Naming Queues
We can namespace our tasks into specific queues, for example if we want to create queue1:
I was working with curl to get data from a api, and wanted to get a specific url for a specific name within an array. I got it working using jq, and will be demonstrating how I got it working.
This tutorial will demonstrate how I ship my backups to Google Drive using the drive cli utility.
What I really like about the drive cli tool, is that it’s super easy to setup and you can easily script your backups to ship it to google drive.
What we will be doing
We will setup the drive cli tool, authorize it with your google account, then show how to upload your files to google drive from your terminal and then create a script to automatically upload your data to google drive and then include it in a cronjob.
Setup Drive CLI Tool
Head over to the drive releases page and get the latest version, at the moment of writing 0.3.9 is the latest. Then we will move it to our path and make it executable:
You should be getting a output when running version as an argument:
12
$ gdrive version
drive version: 0.3.9
Credentials
Move to your home directory and initialize, this will ask you to access the google accounts web page, where you will be authorizing this application to use your google drive account. Upon succesful authorization, you will get a authorization code that we will need to paste in our terminal.
This will then write the credentials file to ~/.gd/credentials.json`. Always remember to keep this file safe.
1234
$ gdrive init
Visit this URL to get an authorization code
https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=x&redirect_uri=x&response_type=code&scope=x&state=x
Paste the authorization code: < paste authorization code here >
You will now see that the credentials for your application has been saved as seen below:
As you can see it checks what is on Google Drive and what is on the Local Drive, then determines what needs to be uploaded, and asks you if you want to continue.
If we run that command again, you will see that it does not upload it again, as the content is already on Google Drive:
123
$ gdrive push -destination Backups/demo/app1 /opt/backups/*
Resolving...
Everything is up-to-date.
To test it out, let’s create a new file and verify if it only uploads the new file:
123456
$ touch file.txt
$ gdrive push -destination Backups/demo/app1 /opt/backups/*
Resolving...
+ /Backups/demo/app1/file.txt
Addition count 1
Proceed with the changes? [Y/n]:y
That is all cool and all, but if we want to script this, we don’t want to be prompted to continue, we can do this by adding a argument -quiet:
Let’s create a script that makes a local archive, then uploads it to Google Drive, I will create the file: /opt/scripts/backup.sh with the following content:
12345678910111213141516
#!/bin/bash
# make a local archive
tar -zcvf /opt/backups/app1.backup-$(date +%F).tar.gz \
/home/me/data/dir1 \
/home/me/data/dir2 \
/home/me/data/dir3 \
/home/me/data/dir4
# backup to gdrive
sleep 1
gdrive push -quiet -destination Backups/Servers/sysadmins.co.za /opt/backups/sysadmins-blog/*
# delete archives older than 14 days from disk
sleep 1
find /opt/backups/ -type f -name "*.tar.gz" -mtime +14 -exec rm {} \;
Make the file executable:
1
$ chmod +x /opt/scripts/backup.sh
Then, we want to add it as a cronjob so that it runs every night at 23:10 in my case:
Open crotab: crontab -e and add the following entry:
1
10 23 * * * /opt/scripts/backup.sh
Thank You
Backups are important, especially when you rely on them, and it was never made. Plan ahead to not be in that situation.
In this tutorial we will visualize our Redis Cluster’s Metrics with Grafana. In order to do that we will setup a redis exporter which will authenticate with redis and then configure prometheus to scrape the endpoint of the redis exporter’s http endpoint to write the time series data to prometheus.
Install Golang
We need to build a binary from the redis exporter project, and we need a Golang environment. If you don’t have golang installed already:
Then restart prometheus, if you have docker redeploy your stack or prometheus container. For prometheus as a service you can use systemctl restart prometheus, depending on your operating system distribution.
Grafana
Head over to Grafana, if you don’t have Grafana, you can view this post to install Grafana.
Then import the dashboard 763 and after some time, you should see a dashboard more or less like this:
In this post we will be setting up a analytical dashboard using grafana to visualize our nginx access logs.
In this tutorial I will be using my other blog sysadmins.co.za which is being served on nginx. We will also be setting up the other components such as filebeat, logstash, elasticsearch and redis, which require if you would like to follow along.
The End Result
We will be able to analyze our Nginx Access logs to answer questions such as:
Whats the Top 10 Countries accessing your website in the last 24 hours
Who’s the Top 10 Referers?
Whats the most popular page for the past 24 hours?
How does the percentage of 200’s vs 404’s look like?
Ability to view results based on status code
Everyone loves a World Map to view hotspots
At the end of the tutorial, your dashboard will look similar to this:
High Level Overview
Our infrastructure will require Nginx with Filebeat, Redis, Logstash, Elasticsearch and Grafana and will look like this:
I will drill down how everything is connected:
Nginx has a custom log_format that we define, that will write to /var/log/nginx/access_json.log, which will be picked up by Filebeat as a input.
and Filebeat has an output that pushes the data to Redis
Logstash is configured with Redis as an input with configured filter section to transform the data and outputs to Elasticsearch
From Grafana we have a configured Elasticsearch datasource
Use the grafana template to build this awesome dashboard on Grafana
But first, a massive thank you to akiraka for templatizing this dashboard and made it available on grafana
Let’s build all the things
I will be using LXD to run my system/server containers (running ubuntu 18), but you can use a vps, cloud instance, multipass, virtualbox, or anything to host your servers that we will be deploying redis, logstash, etc.
Servers provisioned for this setup:
Nginx
Redis
Logstash
Elasticsearch
Grafana
Prometheus
Elasticsearch
If you don’t have a cluster running already, you can follow this tutorial which will help you deploy a HA Elasticsearch Cluster, or if you prefer docker, you can follow this tutorial
Redis
For our in-memory data store, I will be securing my redis installation with a password as well.
$ redis-cli -a "9V5YlWvm8WuC4n1KZLYUEbLruLJLNJEnDzhu4WnAIfgxMmlv" set test ok
$ redis-cli -a "9V5YlWvm8WuC4n1KZLYUEbLruLJLNJEnDzhu4WnAIfgxMmlv" get test
ok
Now the repository for elastic is setup now we need to update and install logstash:
1
$ apt update && apt install logstash -y
Once logstash is installed, we need to provide logstash with a configuration, in our scenario we will have a input for redis, a filter section to transform and output as elasticsearch.
Just make sure of the following:
Populate the connection details of redis (we will define the key in filebeat later)
Ensure that GeoLite2-City.mmdb is in the path that I have under filter
Populate the connectiond details of Elasticsearch and choose a suitable index name, we will need to provide that index name in Grafana later
Create the config: /etc/logstash/conf.d/logs.conf and my config will look like the following. (config source)
On our nginx server we will install nginx and filebeat, then configure nginx to log to a custom log format, and configure filebeat to read the logs and push it to redis.
Next we will configure nginx to log to a seperate file with a custom log format to include data such as the, request method, upstream response time, hostname, remote address, etc.
Under the http directive in your /etc/nginx/nginx.conf, configure the log_format and access_log:
If you would like to setup nginx as a reverse proxy to grafana, you can have a look at this blogpost on how to do that.
Prometheus
If you don’t have Prometheus installed already, you can view my blogpost on setting up Prometheus.
Verifying
To verify if everything works as expected, make a request to your nginx server, then have a look if your index count on elasticsearch increases:
123
$ curl http://elasticsearch-endpoint-address:9200/_cat/indices/logstash-*?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-nginx-x-2020.04.28 SWbHCer-TeOcw6bi_695Xw 5 1 58279 0 32.6mb 16.3mb
If you dont, make sure that all the processes are running on the servers, and that each server is able to reach each other on the targeted ports.
The Fun Part: Dashboarding
Now that we have everything in place, the fun part is to build the dashboards, first we need to configure elasticsearch as our datasource and specify the index we want to read from. Open grafana on http://ip.of.grafana.server:3000, default user and password is admin.
Select config on the left and select datasources, add a datasource, select elasticsearch and specify your datasource name, mine is es-nginx in this example, the url of your elasticsearch endpoint, if you have secured your elasticsearch cluster with authentication, provide the auth, then provide your index name as as provided in logstash.
My configured index will look like logstash-nginx-sysadmins-YYYY-MM-dd, therefore I specified index name as logstash-nginx-sysadmins-* and my timefield as @timestamp, the version, and select save and test, which would look like this:
Now we will import our dashboard template (Once again a massive thank you to Shenxiang, Qingkong and Ruixi which made this template available!), head over to dashboards and select import, then provide the ID: 11190, after that it will prompt what your dashboard needs to be named and you need to select your Elasticsearch and Prometheus datasource.
The description of the panels is in Chinese, if you would like it in english, I have translated mine to english and made the dashboard json available in this gist
Tour of our Dashboard Panels
Looking at our hotspot map:
The summary and top 10 pages:
Page views, historical trends:
Top 10 referers and table data of our logs:
Thank You
I hope this was useful, if you have any issues with this feel free to reach out to me. If you like my work, please feel free to share this post, follow me on Twitter at @ruanbekker or visit me on my website