Ruan Bekker's Blog

From a Curious mind to Posts on Github

Setup a Golang Environment on Ubuntu

In this post I will demonstrate how to setup a golang environment on Ubuntu.

Get the sources:

Get the latest stable release golang tarball from https://golang.org/dl/ and download to the directory path of choice, and extract the archive:

1
2
3
$ cd /tmp
$ wget https://dl.google.com/go/go1.11.2.linux-amd64.tar.gz
$ tar -xf go1.11.2.linux-amd64.tar.gz

Once the archive is extracted, set root permissions and move it to the path where your other executable binaries reside:

1
2
$ sudo chown -R root:root ./go
$ sudo mv go /usr/local/

Cleanup the downloaded archive:

1
$ rm -rf go1.*.tar.gz

Path Variables:

Adjust your path variables in your ~/.profile and append the following:

~/.profile
1
2
export GOPATH=$HOME/go
export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin

Source your profile, or open a new tab:

1
$ source ~/.profile

Test if you can return the version:

1
2
$ go version
go version go1.11.2 linux/amd64

Create a Golang Application

Create a simple golang app that prints a string to stdout:

1
2
3
4
$ cd ~/
$ mkdir -p go/src/hello
$ cd go/src/hello
$ vim app.go

Add the following golang code:

1
2
3
4
5
6
7
package main

import "fmt"

func main() {
    fmt.Printf("Hello!\n")
}

Build the binary:

1
$ go build

Run it:

1
2
$ ./app
Hello!

Golang: Building a Basic Web Server in Go

Continuing with our #golang-tutorial blog series, in this post we will setup a Basic HTTP Server in Go.

Our Web Server:

Our Web Server will respond on 2 Request Paths:

1
2
- / -> returns "Hello, Wolrd!"
- /cheers -> returns "Goodbye!"

Application Code:

If you have not setup your golang environment, you can do so by visiting @AkyunaAkish’s Post on Setting up a Golang Development Enviornment on MacOSX.

Create the server.go or any filename of your choice. Note: I created 2 ways of returning the content of http response for demonstration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package main

import (
  "io"
        "log"
  "net/http"
)

func hello(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Content-Type", "text/plain; charset=utf-8")
  w.WriteHeader(http.StatusOK)
  w.Write([]byte("Hello, World!" + "\n")
  log.Println("hello function handler was executed")
}

func goodbye(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Content-Type", "text/plain; charset=utf-8")
  w.WriteHeader(http.StatusOK)
  io.WriteString(w, "Cheers!" + "\n")
  log.Println("goodbye function handler was executed")
}

func main() {
  http.HandleFunc("/", hello)
  http.HandleFunc("/cheers", goodbye)
  http.ListenAndServe(":8000", nil)
}

Explanation of what we are doing:

  • Programs runs in the package main
  • We are importing 3 packages: io, log and net/http
  • HandleFunc registers the handler function for the given pattern in the DefaultServeMux, in this case the HandleFunc registers / to the hello handler function and /cheers to the goodbye handler function.
  • In our 2 handler functions, we have two arguments:
    • The first one is http.ResponseWriter and its corresponding response stream, which is actually an interface type.
    • The second is *http.Request and its corresponding HTTP request. io.WriteString is a helper function to let you write a string into a given writable stream, this is named the io.Writer interface in Golang.
  • ListenAndServe starts an HTTP server with a given address and handler. The handler is usually nil, which means to use DefaultServeMux
  • The logging is not a requirement, but used it for debugging/verbosity

Running our Server:

Run the http server:

1
$ go run server.go

Client Side Requests:

Run client side http requests to your golang web server:

1
2
3
4
5
6
7
$ curl -i http://localhost:8000/
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Date: Wed, 21 Nov 2018 21:33:42 GMT
Content-Length: 14

Hello, World!

And another request to /cheers:

1
2
3
4
5
6
7
$ curl -i http://localhost:8000/cheers
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Date: Wed, 21 Nov 2018 21:29:46 GMT
Content-Length: 8

Cheers!

Server Side Output:

As we used the log package, the logging gets returned to stdout:

1
2
3
$ go run server.go
2018/11/21 23:29:36 hello function handler was executed
2018/11/21 23:29:46 goodbye function handler was executed

Resources:

Create Read Only Users in MongoDB

In this post I will demonstrate how to setup 2 read only users in MongoDB, one user that will have access to one MongoDB Database and all the Collections, and one user with access to one MongoDB Database and only one Collection.

First Method: Creating and Assigning the User

The first method we will create the user and assign it the read permissions that he needs. In this case read only access to the mytest db.

First logon to mongodb and switch to the admin database:

1
2
3
$ mongo -u dbadmin -p --authenticationDatabase admin
> use admin
switched to db admin

Now list the dbs:

1
2
3
> show dbs
admin       0.000GB
mytest      0.000GB

List the collections and read the data from it for demonstration purposes:

1
2
3
4
5
6
7
8
> use mytest
> show collections;
col1
col2
> db.col1.find()
{ "_id" : ObjectId("5be3d377b54849bb738e3b6b"), "name" : "ruan" }
> db.col2.find()
{ "_id" : ObjectId("5be3d383b54849bb738e3b6c"), "name" : "stefan" }

Now create the user collectionreader that will have access to read all the collections from the database:

1
2
3
4
5
6
7
8
9
10
$ > db.createUser({user: "collectionreader", pwd: "secretpass", roles: [{role: "read", db: "mytest"}]})
Successfully added user: {
  "user" : "collectionreader",
  "roles" : [
    {
      "role" : "read",
      "db" : "mytest"
    }
  ]
}

Exit and log out and log in with the new user to test the permissions:

1
2
3
4
5
6
7
8
9
10
$ mongo -u collectionreader -p --authenticationDatabase mytest
> use mytest
switched to db mytest

> show collections
col1
col2

> db.col1.find()
{ "_id" : ObjectId("5be3d377b54849bb738e3b6b"), "name" : "ruan" }

Now lets try to write to a collection:

1
2
3
4
5
6
7
> db.col1.insert({"name": "james"})
WriteResult({
  "writeError" : {
    "code" : 13,
    "errmsg" : "not authorized on mytest to execute command { insert: \"col1\", documents: [ { _id: ObjectId('5be3d6c0492818b2c966d61a'), name: \"james\" } ], ordered: true }"
  }
})

So we can see it works as expected.

Second Method: Create Roles and Assign Users to the Roles

In the second method, we will create the roles then assign the users to the roles. And in this scenario, we will only grant a user reader access to one collection on a database. Login with the admin user:

1
2
$ mongo -u dbadmin -p --authenticationDatabase admin
> use admin

First create the read only role myReadOnlyRole:

1
> db.createRole({ role: "myReadOnlyRole", privileges: [{ resource: { db: "mytest", collection: "col2"}, actions: ["find"]}], roles: []})

Now create the user and assign it to the role:

1
> db.createUser({ user: "reader", pwd: "secretpass", roles: [{ role: "myReadOnlyRole", db: "mytest"}]})

Similarly, if we had an existing user that we also would like to add to that role, we can do that by doing this:

1
> db.grantRolesToUser("anotheruser", [ { role: "myReadOnlyRole", db: "mytest" } ])

Logout and login with the reader user:

1
2
$ mongo -u reader -p --authenticationDatabase mytest
> use mytest

Now try to list the collections:

1
2
3
4
5
6
7
> show collections
2018-11-08T07:42:39.907+0100 E QUERY    [thread1] Error: listCollections failed: {
  "ok" : 0,
  "errmsg" : "not authorized on mytest to execute command { listCollections: 1.0, filter: {} }",
  "code" : 13,
  "codeName" : "Unauthorized"
}

As we only have read (find) access on col2, lets try to read data from collection col1:

1
2
3
4
5
6
7
> db.col1.find()
Error: error: {
  "ok" : 0,
  "errmsg" : "not authorized on mytest to execute command { find: \"col1\", filter: {} }",
  "code" : 13,
  "codeName" : "Unauthorized"
}

And finally try to read data from the collection we are allowed to read from:

1
2
> db.col2.find()
{ "_id" : ObjectId("5be3d383b54849bb738e3b6c"), "name" : "stefan" }

And also making sure we cant write to that collection:

1
2
3
4
5
6
7
> db.col2.insert({"name": "frank"})
WriteResult({
  "writeError" : {
    "code" : 13,
    "errmsg" : "not authorized on mytest to execute command { insert: \"col2\", documents: [ { _id: ObjectId('5be3db1530a86d900c361465'), name: \"frank\" } ], ordered: true }"
  }
})

Assigning Permissions to Roles

If you later on want to add more permissions to the role, this can easily be done by using grantPrivilegesToRole():

1
2
3
$ mongo -u dbadmin -p --authenticationDatabase admin
> use mytest
> db.grantPrivilegesToRole("myReadOnlyRole", [{ resource: { db : "mytest", collection : "col1"}, actions : ["find"] }])

To view the permissions for that role:

1
> db.getRole("myReadOnlyRole", { showPrivileges : true })

Resources:

IAM Policy to Allow Team Wide and User Level Permissions on AWS Secrets Manager

In this post we will simulate a scenario where a team would like to have access to create secrets under a team path name like /security-team/prod/* and /security-team/dev/* and allow all the users from that team to be able to write and read secrets from that path. Then have individual users create and read secrets from their own isolated path: /security-team/personal/aws-username/* so they can create their personal secrets.

Our Scenario:

  • Create IAM Policy
  • Create 2 IAM Users: jack.smith and steve.adams
  • Create IAM Group, Associate IAM Policy to the Group
  • Attach 2 Users to the Group

The IAM Policy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1541597166491",
            "Action": [
                "secretsmanager:CreateSecret",
                "secretsmanager:DeleteSecret",
                "secretsmanager:DescribeSecret",
                "secretsmanager:GetRandomPassword",
                "secretsmanager:GetSecretValue",
                "secretsmanager:ListSecretVersionIds",
                "secretsmanager:ListSecrets",
                "secretsmanager:PutSecretValue",
                "secretsmanager:TagResource",
                "secretsmanager:UpdateSecret"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:secretsmanager:eu-west-1:123456789012:secret:/security-team/prod/*",
                "arn:aws:secretsmanager:eu-west-1:123456789012:secret:/security-team/dev/*",
                "arn:aws:secretsmanager:eu-west-1:123456789012:secret:/security-team/personal/${aws:username}/*"
            ]
        }
    ]
}

Either configure the access keys and secret keys into the credential provider using aws cli, or for this demonstration I will use them inside the code. But never hardcode your credentials.

Create Secrets with Secrets Manager in AWS using Python Boto3

Instantiate user1 and user2:

1
2
3
>>> import boto3
>>> jack = boto3.Session(aws_access_key_id='ya', aws_secret_access_key='xx', region_name='eu-west-1').client('secretsmanager')
>>> steve = boto3.Session(aws_access_key_id='yb', aws_secret_access_key='xx', region_name='eu-west-1').client('secretsmanager')

Create a team wide secret with jack:

1
2
>>> jack.create_secret(Name='/security-team/prod/app1/username', SecretString='appreader')
{'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': 'x', 'HTTPHeaders': {'date': 'Thu, 08 Nov 2018 07:50:35 GMT', 'x-amzn-requestid': 'x', 'content-length': '193', 'content-type': 'application/x-amz-json-1.1', 'connection': 'keep-alive'}}, u'VersionId': u'x', u'Name': u'/security-team/prod/app1/username', u'ARN': u'arn:aws:secretsmanager:eu-west-1:123456789012:secret:/security-team/prod/app1/username-12ABC00'}

Let jack and steve try to read the secret:

1
2
3
4
>>> jack.get_secret_value(SecretId='/security-team/prod/app1/username')['SecretString']
'appreader'
>>> steve.get_secret_value(SecretId='/security-team/prod/app1/username')['SecretString']
'appreader'

Now let jack create a personal secret, let him read it:

1
2
3
>>> jack.create_secret(Name='/security-team/personal/jack.smith/svc1/password', SecretString='secret')
>>> jack.get_secret_value(SecretId='/security-team/personal/jack.smith/svc1/password')['SecretString']
'secret'

Now let steve try to read the secret and you will see that access is denied:

1
2
3
4
5
6
>>> steve.get_secret_value(SecretId='/security-team/personal/jack.smith/username')['SecretString']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
...
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:iam::123456789012:user/steve.adams is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:eu-west-1:123456789012:secret:/security-team/personal/jack.smith/svc1/password-a1234b

Thats it for this post

Get Application Performance Metrics on Python Flask With Elastic APM on Kibana and Elasticsearch

In this post we will setup a Python Flask Application which includes the APM Agent which will collect metrics, that gets pushed to the APM Server. If you have not setup the Elastic Stack with / or APM Server, you can follow this post to setup the needed.

Then we will make a bunch of HTTP Requests to our Application and will go through the metrics per request type.

Application Metrics

Our Application will have the following Request Paths:

  • / - Returns static text
  • /delay - random delays to simulate increased response latencies
  • /upstream - get data from a upstream provider, if statements to provide dummy 200, 404 and 502 reponses to visualize
  • /5xx - request path that will raise an exception so that we can see the error via apm
  • /sql-write - inserts 5 rows into a sqlite database
  • /sql-read - executes a select all from the database
  • /sql-group - sql query to group all the cities and count them

This is just simple request paths to demonstrate the metrics via APM (Application Performance Monitoring) on Kibana.

Install Flask and APM Agent

Create a virtual environment and install the dependencies:

1
2
3
4
5
$ apt install python python-setuptools -y
$ easy_install pip
$ pip install virtualenv
$ pip install elastic-apm[flask]
$ pip install flask

For more info on APM Configuration.

Instrument a Bare Bones Python Flask app with APM:

A Barebones app with APM Configured will look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
from flask import Flask, jsonify
from elasticapm.contrib.flask import ElasticAPM
from elasticapm.handlers.logging import LoggingHandler

app = Flask(__name__)
apm = ElasticAPM(app, server_url='http://localhost:8200', service_name='flask-app-1', logging=True)

@app.route('/')
def index():
    return jsonify({"message": "response ok"}), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)

This will provide metrics on the / request path. In order to trace transaction ids from the metrics, we need to configure the index on Kibana. To do this, head over to Kibana, Management, Index Patterns, Add Index Pattern, apm*, select @timestamp as the time filter field name.

This will allow you to see the data when tracing the transaction id’s via the Discover UI.

Create the Python Flask App

Create the Flask App with the request paths as mentioned in the beginning:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import sqlite3, requests, time, logging, random
from flask import Flask, jsonify
from elasticapm.contrib.flask import ElasticAPM
from elasticapm.handlers.logging import LoggingHandler

names = ['ruan', 'stefan', 'philip', 'norman', 'frank', 'pete', 'johnny', 'peter', 'adam']
cities = ['cape town', 'johannesburg', 'pretoria', 'dublin', 'kroonstad', 'bloemfontein', 'port elizabeth', 'auckland', 'sydney']
lastnames = ['smith', 'bekker', 'admams', 'phillips', 'james', 'adamson']

conn = sqlite3.connect('database.db')
conn.execute('CREATE TABLE IF NOT EXISTS people (name STRING, age INTEGER, surname STRING, city STRING)')
#sqlquery_write = conn.execute('INSERT INTO people VALUES("{}", "{}", "{}", "{}")'.format(random.choice(names), random.randint(18,40), random.choice(lastnames), random.choice(cities)))
seconds = [0.002, 0.003, 0.004, 0.01, 0.3, 0.2, 0.009, 0.015, 0.02, 0.225, 0.009, 0.001, 0.25, 0.030, 0.018]

app = Flask(__name__)
apm = ElasticAPM(app, server_url='http://localhost:8200', service_name='my-app-01', logging=False)

@app.route('/')
def index():
    return jsonify({"message": "response ok"})

@app.route('/delay')
def delay():
    time.sleep(random.choice(seconds))
    return jsonify({"message": "response delay"})

@app.route('/upstream')
def upstream():
    r = requests.get('https://api.ruanbekker.com/people').json()
    r.get('country')
    if r.get('country') == 'italy':
        return 'Italalia!', 200
    elif r.get('country') == 'canada':
        return 'Canada!', 502
    else:
        return 'Not Found', 404

@app.route('/5xx')
def fail_with_5xx():
    value = 'a' + 1
    return jsonify({"message": value})

@app.route('/sql-write')
def sqlw():
    conn = sqlite3.connect('database.db')
    conn.execute('INSERT INTO people VALUES("{}", "{}", "{}", "{}")'.format(random.choice(names), random.randint(18,40), random.choice(lastnames), random.choice(cities)))
    conn.execute('INSERT INTO people VALUES("{}", "{}", "{}", "{}")'.format(random.choice(names), random.randint(18,40), random.choice(lastnames), random.choice(cities)))
    conn.execute('INSERT INTO people VALUES("{}", "{}", "{}", "{}")'.format(random.choice(names), random.randint(18,40), random.choice(lastnames), random.choice(cities)))
    conn.execute('INSERT INTO people VALUES("{}", "{}", "{}", "{}")'.format(random.choice(names), random.randint(18,40), random.choice(lastnames), random.choice(cities)))
    conn.execute('INSERT INTO people VALUES("{}", "{}", "{}", "{}")'.format(random.choice(names), random.randint(18,40), random.choice(lastnames), random.choice(cities)))
    conn.commit()
    conn.close()
    return 'ok', 200

@app.route('/sql-read')
def sqlr():
    conn = sqlite3.connect('database.db')
    conn.row_factory = sqlite3.Row
    cur = conn.cursor()
    cur.execute('select * from people')
    rows = cur.fetchall()
    conn.close()
    return 'ok', 200

@app.route('/sql-group')
def slqg():
    conn = sqlite3.connect('database.db')
    conn.row_factory = sqlite3.Row
    cur = conn.cursor()
    cur.execute('select count(*) as num, city from people group by city')
    rows = cur.fetchall()
    conn.close()
    return 'ok', 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)

Run the app:

1
$ python app.py

At this point, we wont have any data on APM as we need to make requests to our application. Let’s make 10 HTTP GET Requests on the / Request Path:

1
$ count=0 && while [ $count -lt 10 ]; do curl http://application-routable-address:80/; sleep 1; count=$((count+1)); done

Visualize the Root Request Path

Head over to Kibana, Select APM and you will see something similar like below when selecting the timepicker to 15 minutes at the right top corner. This page will give you the overview of all your configured applications and the average response times over the selected time, transactions per minute, errors per minute etc:

When you select your application, you will find the graphs on you response times and requests per minute, also a breakdown per request path:

When selecting the request path, in this case GET /, you will find a breakdown of metrics only for that request and also the response time distribution for that request path, if you select frame from the response time distribution, it will filter the focus to that specific transaction.

When you scroll a bit down to the Transaction Sample section, you will find data about the request, response, system etc:

From the Transaction Sample, you can select the View Transaction in Discover button, which will trace that transaction id on the Discover UI:

Increasing the http curl clients running simultaneously from different servers and increasing the time for 15 minutes to have more metrics will result in the screenshot below, notice the 6ms response time can easily be traced selecting it in the response time distribution, then discovering it in the UI, which will give you the raw data from that request:

Viewing Application Errors in APM

Make a couple of requests to /5xx:

1
$ curl http://application-routable-endpoint:80/5xx

Navigate to the app, select Errors, then you will see the exception details that was returned. Here we can see that in our code we tried to concatenate integers with strings:

Furthermore we can select that error and it will provide us a direct view on where in our code the error gets generated:

Pretty cool right?! You can also further select the library frames, which will take you to the lower level code that raised the exception. If this errors can be drilled down via the discover ui, to group by source address etc.

Simulate Response Latencies:

Make a couple of requests to the /delay request path, and you should see the increased response times from earlier:

Requests where Database Calls are Executed

The while loop to call random request paths:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
count=0 && while [ $count -lt 1000 ];
do
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-write;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-read;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-group;
  curl -H "Host: my-eu-server" -i http://x.x.x.x/sql-write;
  curl -H "Host: my-us-server" -i http://x.x.x.x/sql-write;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-write;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-write;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-read;
  curl -H "Host: my-eu-server" -i http://x.x.x.x/sql-group;
  curl -H "Host: my-us-server" -i http://x.x.x.x/sql-group;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-write;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-write;
  curl -H "Host: my-eu-server" -i http://x.x.x.x/sql-group;
  curl -H "Host: my-za-server" -i http://x.x.x.x/sql-group;
  count=$((count+1));
done

When we look at our applications performance monitoring overview, we can see the writes provide more latencies as the group by’s:

The /sql-write request overview:

When selecting a transaction sample, we can see the timeline of each database call:

When looking at the /sql-group request overview, we can see that the response times increasing overtime, as more data is written to the database, it takes longer to read and group all the data from the database:

The transaction details shows the timeline of the database query from that request:

When you select the database select query on the timeline view, it will take you to the exact database query that was executed:

When we include a database call with a external request to a remote http endpoint, we will see something like:

Viewing 4xx and 5xx Response Codes

From the application code we are returning 2xx, 4xx, and 5xx response codes for this demonstration to visualize them:

Configuring more Applications

Once more apps are configured, and they start serving traffic, they will start appearing on the APM UI as below:

APM is available for other languages as well and provides a getting started snippets from the APM UI. For more information on APM, have a look at their Documentation

Hope this was useful.

Setup APM Server on Ubuntu for Your Elastic Stack to Get Insights in Your Application Performance Metrics

In this post we will setup the Elastic Stack with Elasticsearc, Kibana and APM . The APM Server (Application Performance Metrics) which will receive the metric data from the application side, and is then pushed to apm indices on Elasticsearch.

This will be a 2 post blog on APM:

What is APM

From their website APM is described as: “Elastic APM is an application performance monitoring system built on the Elastic Stack. It allows you to monitor software services and applications in real time, collecting detailed performance information on response time for incoming requests, database queries, calls to caches, external HTTP requests, etc.”

You get metrics like average, p99 response times etc, and also have insights when errors occur, it even allows you to look at the stacktrace, poinpointing on which line of your code it ocurred etc.

APM Agents:

The APM Agents will be loaded inside your application, application metrics will then be pushed to the APM Server (which we will setup in this post), which then gets pushed to Elasticsearch and is then consumed by Kibana.

At the time of writing, the APM Agents are supported in the following languages:

  • Node.js
  • Django
  • Flask
  • Ruby on Rails
  • Rack
  • RUM
  • Golang
  • Java

Setup the Elastic Stack

One thing to note, every service in your Elastic Stack needs to be running on the same version. In this post we will setup Elasticsearch, APM and Kibana all running on version 6.4.3

Setup the Pre-Requirements:

Elasticsearch depends on Java, se we will go ahead and setup the repositories:

1
2
3
4
5
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ apt-get install apt-transport-https -y
$ echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
$ apt update && apt upgrade -y
$ apt install openjdk-8-jdk -y

Verify that Java is installed:

1
2
3
4
$ java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-1ubuntu0.16.04.1-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

Setup Kernel parameters for Elasticsearch:

1
2
$ sysctl -w vm.max_map_count=262144
$ echo 'vm.max_map_count=262144' >> /etc/sysctl.conf

Setup Elasticsearch:

Search for the latest versions (when already having elasticsearch, either upgrade or install apm on the same version as elasticsearch/kibana):

1
2
3
$ apt-cache madison elasticsearch
elasticsearch |      6.4.3 | https://artifacts.elastic.co/packages/6.x/apt stable/main amd64 Packages
elasticsearch |      6.4.2 | https://artifacts.elastic.co/packages/6.x/apt stable/main amd64 Packages

Install Elasticsearch:

1
$ apt-get install elasticsearch=6.4.3 -y

Configure Elasticsearch to lock the memory on startup:

1
$ sed -i 's/#bootstrap.memory_lock: true/bootstrap.memory_lock: true/g' /etc/elasticsearch/elasticsearch.yml

Enable Elasticsearch on startup and start the service:

1
2
3
$ systemctl daemon-reload
$ systemctl enable elasticsearch.service
$ systemctl start elasticsearch.service

Install Kibana:

Install Kibana version 6.4.3:

1
$ apt install kibana=6.4.3 -y

For demonstration, I will configure Kibana to listen on all interfaces on port 5601, but note this will enable access for everyone, you can [follow this blogpost] to setup a Nginx Reverse Proxy to upstream to localhost on port 5601.

Since this demonstration we are using Elasticsearch locally, so if you have a remote cluster, configuration can be applied where needed.

1
2
$ sed -i 's/#server.host: "localhost"/server.host: "0.0.0.0"/'g /etc/kibana/kibana.yml
$ sed -i 's/#elasticsearch.url: "http:\/\/localhost:9200"/elasticsearch.url: "http:\/\/localhost:9200"/'g /etc/kibana/kibana.yml

Enable Kibana on startup and start the service:

1
2
$ systemctl enable kibana.service
$ systemctl restart kibana.service

Install the APM Server

Install APM Server version 6.4.3:

1
$ apt install apm-server=6.4.3 -y

Since we have everything locally, the configuration can be kept as is, but if you need to configure the elasticsearch or kibana hosts, it can be done via /etc/apm-server/apm-server.yml

Then once Kibana and Elasticsearch is started, load the mapping templates, enable and start the service:

1
2
3
$ apm-server setup
$ systemctl enable apm-server.service
$ systemctl restart apm-server.service

Ensure all the services are running with netstat -tulpn and port 9200, 9300, 5601 and 8300 should be listening

Access Your Elastic Stack

Access Kibana on your routable endpoint on port 5601 and you should see something like this:

Configuring a Application to push metrics to APM

In the next post I will setup a Python Flask Application on APM

Benchmark Website Response Times With CURL

We can gain insights when making requests to websites such as:

  • Lookup time
  • Connect time
  • AppCon time
  • Redirect time
  • PreXfer time
  • StartXfer time

We will make a request to a website that has caching enabled, the first hit will be a MISS:

1
2
3
4
5
6
7
8
9
10
$ curl -s -w '\nLookup time:\t%{time_namelookup}\nConnect time:\t%{time_connect}\nAppCon time:\t%{time_appconnect}\nRedirect time:\t%{time_redirect}\nPreXfer time:\t%{time_pretransfer}\nStartXfer time:\t%{time_starttransfer}\n\nTotal time:\t%{time_total}\n' -o /dev/null https://user-images.githubusercontent.com/567298/53351889-85572000-392a-11e9-9720-464e9318206e.jpg

Lookup time:  1.524465
Connect time: 1.707561
AppCon time:  0.000000
Redirect time:    0.000000
PreXfer time: 1.707656
StartXfer time:   1.897660

Total time:   2.451824

The next hit will be a HIT:

1
2
3
4
5
6
7
8
9
10
$ curl -s -w '\nLookup time:\t%{time_namelookup}\nConnect time:\t%{time_connect}\nAppCon time:\t%{time_appconnect}\nRedirect time:\t%{time_redirect}\nPreXfer time:\t%{time_pretransfer}\nStartXfer time:\t%{time_starttransfer}\n\nTotal time:\t%{time_total}\n' -o /dev/null https://user-images.githubusercontent.com/567298/53351889-85572000-392a-11e9-9720-464e9318206e.jpg

Lookup time:  0.004441
Connect time: 0.188065
AppCon time:  0.000000
Redirect time:    0.000000
PreXfer time: 0.188160
StartXfer time:   0.381344

Total time:   0.926420

Similar Posts:

How to Bootstrap Nodes With Python Using Ansible

As Ansible depends on Python, therefore we can bootstrap our nodes with Python using a Ansible Playbook

Inventory

The nodes we want to bootstrap:

inventory.ini
1
2
3
4
5
6
7
[new]
node-1
node-2
node-3

[new:vars]
ansible_python_interpreter=/usr/bin/python3

Playbook

Our playbook with what we want to do:

bootstrap-python.yml
1
2
3
4
5
6
7
---
- hosts: all
  gather_facts: False

  tasks:
  - name: install python
    raw: test -e /usr/bin/python || ( apt update && apt install python -y )

Deploy

Deploy with Ansible:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ ansible-playbook -i inventory.ini bootstrap-python.yml

PLAY [all] ***********************************************************************************************************************************************************************************************

TASK [install python] ************************************************************************************************************************************************************************************
changed: [node-1]
changed: [node-2]
changed: [node-3]

PLAY RECAP ***********************************************************************************************************************************************************************************************
node-1                     : ok=2    changed=2    unreachable=0    failed=0
node-2                     : ok=2    changed=2    unreachable=0    failed=0
node-3                     : ok=2    changed=2    unreachable=0    failed=0

This is it for this post, all posts for this tutorial will be posted under #ansible-tutorial

How to Install Packages on Remote Systems With Ansible

We will use Ansible to deploy packages to remote systems and in this case all the remote systems are running Debian, therefore we will be using the APT package manager.

Pre-Requisites:

Ensure that you have installed Ansible and setup the SSH Config for your remote systems, how to do that can be found under the post: setting up ansible

Our Inventory

The inventory file that describes our hosts:

inventory.ini
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[scaleway]
cluster-node-1
cluster-node-2

[hetzner]
docker-node-1
docker-node-2
docker-node-3
glusterfs-node-1
glusterfs-node-2
elasticsearch-node-1
elasticsearch-node-2

[scaleway:vars]
ansible_python_interpreter=/usr/bin/python3
location=france

[hetzner:vars]
ansible_python_interpreter=/usr/bin/python3
location=germany

Playbook

Our playbook that we will define that we want to deploy packages using apt to all hosts:

packages.yml
1
2
3
4
5
6
7
8
9
10
11
12
---
- hosts: all
  tasks:
  - name: Install Packages
    apt: name= state=latest update_cache=yes
    with_items:
      - ntp
      - python
      - tcpdump
      - wget
      - openssl
      - curl

Deploy

Running the playbook to deploy the packages to the remote servers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
$ ansible-playbook -i inventory.ini packages.yml

PLAY [all] ***********************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************
ok: [glusterfs-node-2]
ok: [glusterfs-node-1]
ok: [docker-node-1]
ok: [docker-node-2]
ok: [docker-node-3]
ok: [elasticsearch-node-1]
ok: [elasticsearch-node-2]
ok: [cluster-node-1]
ok: [cluster-node-2]

TASK [Install Packages] **********************************************************************************************************************************************************************************
changed: [docker-node-1] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
changed: [docker-node-2] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
changed: [docker-node-3] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
changed: [elasticsearch-node-1] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
changed: [glusterfs-node-1] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
changed: [glusterfs-node-2] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
changed: [elasticsearch-node-2] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
ok: [cluster-node-1] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])
ok: [cluster-node-2] => (item=[u'ntp', u'python', u'tcpdump', u'wget', u'openssl', u'curl'])

PLAY RECAP ***********************************************************************************************************************************************************************************************
docker-node-1              : ok=2    changed=1    unreachable=0    failed=0
docker-node-2              : ok=2    changed=1    unreachable=0    failed=0
docker-node-3              : ok=2    changed=1    unreachable=0    failed=0
elasticsearch-node-1       : ok=2    changed=1    unreachable=0    failed=0
elasticsearch-node-2       : ok=2    changed=1    unreachable=0    failed=0
glusterfs-node-1           : ok=2    changed=1    unreachable=0    failed=0
glusterfs-node-2           : ok=2    changed=1    unreachable=0    failed=0
cluster-node-1             : ok=2    changed=0    unreachable=0    failed=0
cluster-node-2             : ok=2    changed=0    unreachable=0    failed=0

This is it for this post, all posts for this tutorial will be posted under #ansible-tutorials

Query 24 Hours Worth of Data Using BatchGet on Amazon DynamoDB Using Scan and Filter Without a GSI

I’m testing how to query data in DynamoDB which will always be the retrieval of yesterdays data, without using a Global Secondary Index.

This is done just to see what other ways you can use to query data based on a specific timeframe.

Use-Case:

Data from DynamoDB needs to be batch processed (daily for the last 24-hours), into a external datasource. Data will be written into DynamoDB, the HK (uuid) and RK (timestamp) will be duplicated to the daily table. But only uuid and timestamp will be duplicated to the daily table, and only data for that day will be written into that datestamp formatted table name.

Let’s say data for 2018-10-30 needs to be written into our external data source, we will do a scan on table tbl-test_20181030, then from our response we will have a list of HashKeys (uuid) which we will use to do a BatchGet Item on our base table: tbl-test_base, which essentially grabs all the data for that day.

If deeper filtering needs to be done on that day, the FilterExpression can be used to do a deeper filtering which leads to grabbing only the filtered down data from the base table.

Note: The base table might have millions of items, so a Scan operation on the Base table would be really expensive, as it reads all the items in the table.

Once the data has been processed, the daily or metadata table can be removed.

DynamoDB Table Design

The base table: tbl-test_base will have:

  • HashKey: uuid (string)
  • RangeKey: timestamp (number)
  • Attributes: city, stream, transaction_date, name, metric_uri
  • Item will look like:
1
2
3
4
5
6
7
8
9
{
  u'uuid': u'fb4ddeb9-3b5e-47b3-bbab-1aa1d8e8f47b',
  u'timestamp': 1540891276,
  u'city': u'sydney',
  u'stream': u'NONE',
  u'transaction_date': u'2018-10-30 11:21:16',
  u'metric_uri': u'some-dummy-metric-uri',
  u'name': u'frank'
}

he Daily Table: tbl-test_20181030 will look like:

  • HashKey: uuid
  • Attributes: timestamp
  • Item will look like:
1
2
3
4
{
  u'uuid': u'fb4ddeb9-3b5e-47b3-bbab-1aa1d8e8f47b',
  u'timestamp': 1540891276
}

Demonstration using Python

Creating the Metadata table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import boto3, time, uuid, random

session = boto3.Session(region_name='eu-west-1', profile_name='dev')
resource = session.resource('dynamodb')
client = session.client('dynamodb')

def create_table():
    table_name = "tbl-test_{0}".format(time.strftime("%Y%m%d"))
    response = resource.create_table(
        TableName=table_name,
        KeySchema=[{
            'AttributeName': 'uuid',
            'KeyType': 'HASH'
        }],
        AttributeDefinitions=[{
            'AttributeName': 'uuid',
            'AttributeType': 'S'
        }],
        ProvisionedThroughput={
            'ReadCapacityUnits': 1,
            'WriteCapacityUnits': 1
        }
    )

    resource.Table(table_name).wait_until_exists()

    arn = client.describe_table(TableName=table_name)['Table']['TableArn']
    client.tag_resource(
        ResourceArn=arn,
        Tags=[
            {'Key': 'Name','Value': 'dynamo_table'},
            {'Key': 'Environment','Value': 'Dev'},
            {'Key': 'CreatedBy','Value': 'Ruan'}
        ]
    )

    return resource.Table(table_name).table_status

print(create_table())

Write 400 Items to DynamoDB:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import boto3, time, uuid, random

session = boto3.Session(region_name='eu-west-1', profile_name='dev')
resource = session.resource('dynamodb')
client = session.client('dynamodb')

base_table = 'tbl-test_base'
meta_table = 'tbl-test_{0}'.format(time.strftime("%Y%m%d"))

people = ['james', 'john', 'frank', 'paul', 'nathan', 'kevin']
cities = ['ireland', 'cape town', 'pretoria', 'paris', 'amsterdam', 'auckland', 'sydney']

def write_dynamo(uuid, timestamp):
    resource.Table(base_table).put_item(
        Item={
            'uuid': uuid,
            'timestamp': timestamp,
            'metric_uri': 'some-dummy-metric-uri',
            'transaction_date': time.strftime("%Y-%m-%d %H:%M:%S"),
            'name': random.choice(people),
            'stream': 'NONE',
            'city': random.choice(cities)
        }
    )

    resource.Table(meta_table).put_item(
        Item={
            'uuid': uuid,
            'timestamp': timestamp
        }
    )

    return 'Written'

for x in xrange(400):
    time.sleep(1)
    write_dynamo(str(uuid.uuid4()), int(time.time()))
    print(x)

Getting Data for 20181030 but also filter data greater than the timestamp attribute, greater than 1540841144 in epoch time (which will give us about 254 items).

The BatchGet Item supports up to 100 items per call, we will limit the scans on 100 items per call, then paginate using the ExlusiveStartKey with the value of our LastEvaluatedKey that we will get from our response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import boto3,time
from boto3.dynamodb.conditions import Key

base_table = 'tbl-test_base'
meta_table = 'tbl-test_20181030'

session = boto3.Session(region_name='eu-west-1', profile_name='dev')
resource = session.resource('dynamodb')
table = resource.Table(meta_table)
filtering_expression = Key('timestamp').gt(1540841144)

response = table.scan(FilterExpression=filtering_expression, Limit=100)

finished=False
while finished != True:
    if 'LastEvaluatedKey' in response.keys():
        print("Getting {} Items".format(response['Count']))
        items = resource.batch_get_item(RequestItems={base_table: {'Keys': response['Items']}})
        print(items['Responses'][base_table])
        time.sleep(2)
        response = table.scan(FilterExpression=filtering_expression, Limit=100, ExclusiveStartKey=response['LastEvaluatedKey'])
    else:
        print("Getting {} Items".format(response['Count']))
        items = resource.batch_get_item(RequestItems={base_table: {'Keys': response['Items']}})
        print(items['Responses'][base_table])
        finished=True

Running it:

1
2
3
4
5
6
7
8
9
$ python dynamodb-batch-get.py
Getting 100 Items
[{u'city': u'pretoria', u'uuid': u'e8bc0d1c-2b57-4de2-b0e1-35ef1fe0edf1', u'stream': u'NONE', u'timestamp': Decimal('1540846990'), u'transaction_date': u'2018-10-29 23:03:10', u'metric_uri': u'some-dummy-metric-uri', u'name': u'frank'}, {u'city': u'amsterdam', u'uuid':
...
Getting 100 Items
[{u'city': u'sydney', u'uuid': u'5bc51ce9-2809-46c9-a3f2-ff8180086d92', u'stream': u'NONE', u'timestamp': Decimal('1540848599'), u'transaction_date': u'2018-10-29 23:29:59', u'metric_uri': u'some-dummy-metric-uri', u'name': u'frank'}
...
Getting 54 Items
[{u'city': u'cape town', u'uuid': u'5e069f34-0e97-4a49-9ca9-da2213edb689'...

Verifying that each call only scans 100 at a time:

1
2
3
4
5
6
7
8
9
>>> response = table.scan(FilterExpression=filtering_expression, Limit=100)
>>> response.keys()
[u'Count', u'Items', u'LastEvaluatedKey', u'ScannedCount', 'ResponseMetadata']
>>> response.get('LastEvaluatedKey')
{u'uuid': u'e8c52a55-ca9e-4718-83d2-1b44a90f43e6'}
>>> response.get('Count')
100
>>> response.get('ScannedCount')
100

Other Thoughts:

Querying data is a lot easier using a Global Secondary Index where you could similarly have the metric_uri as the HashKey and transaction_date as the RangeKey:

1
2
3
4
5
6
>>> response = table.query(
    IndexName='metric_uri-transaction_date-index',
    KeyConditionExpression=Key('metric_uri').eq('some-dummy-metric-uri') & Key('transaction_date').begins_with('2018-10-30')
)
>>> response['Count']
400

Also note that depending on how you setup your GSI, in most cases its a exact duplicate in storage from your base table, so could potentially be double the costs.