Ruan Bekker's Blog

From a Curious mind to Posts on Github

Using Drone CI to Build a Jekyll Site and Deploy to Docker Swarm

image

CICD Pipelines! <3

In this post I will show you how to setup a cicd pipeline using drone to build a jekyll site and deploy to docker swarm.

Environment Overview

Jekyll’s Codebase: Our code will be hosted on Github (I will demonstrate how to set it up from scratch)

Secret Store: Our secrets such as ssh key, swarm host address etc will be stored in drones secrets manager

Docker Swarm: Docker Swarm has Traefik as a HTTP Loadbalancer

Drone Server and Agent: If you dont have drone, you can setup drone server and agent on docker or have a look at cloud.drone.io

Workflow:

1
2
3
4
5
* Whenever a push to master is receive on github, the pipeline will be triggered
* The content from our github repository will be cloned to the agent on a container
* Jekyll will build and the output will be transferred to docker swarm using rsync
* The docker-compose.yml will be transferred to the docker swarm host using scp
* A docker stack deploy is ran via ssh

Install Jekyll Locally

Install Jekyll locally, as we will use it to create the initial site. I am using a mac, so I will be using brew. For other operating systems, have a look at this post.

I will be demonstrating with a weightloss blog as an example.

Install jekyll:

1
$ brew install jekyll

Go ahead and create a new site which will host the data for your jekyll site:

1
$ jekyll new blog-weightloss

Create a Github Repository

First we need to create an empty github repository, in my example it was github.com/ruanbekker/blog-weightloss.git. Once you create the repo change into the directory created by the jekyll new command:

1
$ cd blog-weightloss

Now initialize git, set the remote, add the jekyll data and push to github:

1
2
3
4
5
$ git init
$ git remote add origin git@github.com:ruanbekker/blog-weightloss.git # <== change to your repository
$ git add .
$ git commit -m "first commit"
$ git push origin master

You should see your data on your github repository.

Create Secrets on Drone

Logon to the Drone UI, sync repositories, activate the new repository and head over to settings where you will find the secrets section.

Add the following secrets:

1
2
3
4
5
6
7
8
Secret Name: swarm_host
Secret Value: ip address of your swarm

Secret Name: swarm_key
Secret Value: contents of your private ssh key

Secret Name: swarm_user
Secret Value: the user that is allowed to ssh

You should see the following:

image

Add the Drone Config

Drone looks from a .drone.yml file in the root directory for instructions on how to do its tasks. Lets go ahead and declare our pipeline:

1
$ vim .drone.yml

And populate the drone config:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
pipeline:
  jekyll-build:
    image: jekyll/jekyll:latest
    commands:
      - touch Gemfile.lock
      - chmod a+w Gemfile.lock
      - chown -R jekyll:jekyll /drone
      - gem update --system
      - gem install bundler
      - bundle install
      - bundle exec jekyll build

  transfer-build:
    image: drillster/drone-rsync
    hosts:
      from_secret: swarm_host
    key:
      from_secret: swarm_key
    user:
      from_secret: swarm_user
    source: ./*
    target: ~/my-weightloss-blog.com
    recursive: true
    delete: true
    when:
      branch: [master]
      event: [push]

  transfer-compose:
    image: appleboy/drone-scp
    host:
      from_secret: swarm_host
    username:
      from_secret: swarm_user
    key:
      from_secret: swarm_key
    target: /root/my-weightloss-blog.com
    source:
      - docker-compose.yml
    when:
      branch: [master]
      event: [push]

  deploy-jekyll-to-swarm:
    image: appleboy/drone-ssh
    host:
      from_secret: swarm_host
    username:
      from_secret: swarm_user
    key:
      from_secret: swarm_key
    port: 22
    script:
      - docker stack deploy --prune -c /root/my-weightloss-blog.com/docker-compose.yml apps
    when:
      branch: [master]
      event: [push]

Notifications?

If you want to be notified about your builds, you can add a slack notification step as the last step.

To do that, create a new webhook integration, you can follow this post for a step by step guide. After you have the webhook, go to secrets and create a slack_webhook secret.

Then apply the notification step as shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  notify-via-slack:
    image: plugins/slack
    webhook:
      from_secret: slack_webhook
    channel: system_events
    template: >
      
        [DRONE CI]: ** : /
        ( -  | )

      
        [DRONE CI]: ** : /
        ( -  | )
      

Based on the status, you should get a notification similar like this:

image

Add the Docker Compose

Next we need to declare our docker compose file which is needed to deploy our jekyll service to the swarm:

1
$ vim docker-compose.yml

And populate this info (just change the values for your own environment/settings):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
version: '3.5'

services:
  myweightlossblog:
    image: ruanbekker/jekyll:contrast
    command: jekyll serve --watch --force_polling --verbose
    networks:
      - appnet
    volumes:
      - /root/my-weightloss-blog.com:/srv/jekyll
    deploy:
      mode: replicated
      replicas: 1
      labels:
        - "traefik.backend.loadbalancer.sticky=false"
        - "traefik.backend.loadbalancer.swarm=true"
        - "traefik.backend=myweightlossblog"
        - "traefik.docker.network=appnet"
        - "traefik.entrypoints=https"
        - "traefik.frontend.passHostHeader=true"
        - "traefik.frontend.rule=Host:www.my-weightloss-blog.com,my-weightloss-blog.com"
        - "traefik.port=4000"
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.role == manager
networks:
  appnet:
    external: true

Push to Github

Now we need to push our .drone.yml and docker-compose.yml to github. Since the repository is activated on drone, any push to master will trigger the pipeline, so after this push we should go to drone to look at our pipeline running.

Add the untracked files and push to github:

1
2
3
4
$ git add .drone.yml
$ git add docker-compose.yml
$ git commit -m "add drone and docker config"
$ git push origin master

As you head over to your drone ui, you should see your pipeline output which will look more or less like this (just look how pretty it is! :D )

image

Test Jekyll

If your deployment has completed you should be able to access your application on the configured domain. A screenshot of my response when accessing Jekyll:

image

Absolutely Amazingness! I really love drone!

Setup a Blog With Hugo

image

In this post we will setup a blog on hugo and using the theme pickles.

What is Hugo

Hugo is a Open-Source Static Site Generator which runs on Golang.

Installing Hugo

Im using a mac so I will be installing hugo with brew, for other operating systems, you can have a look at their documentation

1
$ brew install hugo

Create your new site:

1
$ hugo new site myblog

Install a Theme

We will use a 3rd party theme, go ahead and install the pickles theme:

1
$ git clone -b release https://github.com/mismith0227/hugo_theme_pickles themes/pickles

Custom Syntax Highlighting

Generate syntax highlight css, for a list of other styles see this post

1
2
$ mkdir -p static/css
$ hugo gen chromastyles --style=colorful > static/css/syntax.css

Append this below style.css in themes/pickles/layouts/partials/head.html

1
<link rel="stylesheet" href="/css/syntax.css"/>

set pygments settings in config.toml:

1
2
3
4
5
baseURL = "http://example.org/"
languageCode = "en-us"
pygmentsCodeFences = true
pygmentsUseClasses = true
title = "My Hugo Site"

Create your First Blogpost

Create your first post:

1
2
$ hugo new posts/my-first-post.md
/Users/ruan/myblog/content/posts/my-first-post.md created

Populate your page with some data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
title: "My First Post"
date: 2019-04-23T09:39:23+02:00
description: This is an introduction post to showcase Hugo.
slug: hello-world-my-first-post
categories:
- hugo
- blog
tags:
- helloworld
- hugo
- blog
draft: false
---

![](https://hugo-simple-blog.work/images/uploads/gopher_hugo.png)

Hello world and welcome to my first post

## New Beginning

This is a new beginning on my blog on hugo and this seems pretty cool so im adding random text here because I dont know **what** to add here. So im adding a lot more random text here.

This is another test.

## Code

This is python code:


from random import randint
from faker import Fake
randint(1, 2)

destFile = "largedataset-" + timestart + ".txt"
file_object = open(destFile,"a")
file_object.write("uuid" + "," + "username" + "," + "name" + "," + "country" + "\n")

def create_names(fake):
    for x in range(numberRuns):
        genUname = fake.slug()
        genName =  fake.first_name()
        genCountry = fake.country()
file_object.write(genUname + "," + genName + "," + genCountry + "\n")
..


This is bash code:


#!/usr/bin/env bash
var="ruan"
echo "Hello, ${var}"


## Tweets

This is one of my tweets, see [configuration](https://gohugo.io/content-management/shortcodes/#highlight) for more shortcodes:



## Tables

This is a table:

|**id**    |**name**|**surname**|**age**| **city**     |
|----------|--------|-----------|-------|--------------|
|20-1232091|ruan    |bekker     |32     |cape town     |
|20-2531020|stefan  |bester     |32     |kroonstad     |
|20-4835056|michael |le roux    |35     |port elizabeth|

## Lists

This is a list:

* one
* two
* [three](https://example.com)

This is another list:

1. one
2. two
3. [three](https://example.com)

## Images

This is an embedded photo:

![](https://images.pexels.com/photos/248797/pexels-photo-248797.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500)

Run the Server

You can set the flags in your main config as well. Go ahead and run the server:

1
2
3
4
5
$ hugo server \
  --baseURL "http://localhost/" \
  --themesDir=themes --theme=pickles \
  --bind=0.0.0.0 --port=8080 --appendPort=true \
  --buildDrafts --watch --environment production

Screenshots

When you access your blog on port 8080 you should see your post. Some screenshots below:

image

image

image

image

References:

Setup a Drone CICD Environment on Docker With Letsencrypt

drone-ci

What is Drone?

Drone is a self-service continuous delivery platform which can be used for CICD pipelines, devopsy stuff which is really awesome.

With Configuration as Code, Pipelines are configured with a simple, easy‑to‑read file that you commit to your git repository such as github, gitlab, gogs, gitea etc.

Each Pipeline step is executed inside an isolated Docker container that is automatically downloaded at runtime, if not found in cache.

Show me pipelines!

A pipeline can look as easy as:

1
2
3
4
5
6
7
8
9
10
11
12
kind: pipeline
steps:
- name: test
  image: node
  commands:
  - npm install
  - npm test
services:
- name: database
  image: mysql
  ports:
  - 3306

Open for Testing!

I have enabled public access, so please go ahead and launch your cicd pipelines on my drone setup as I want to test the stability of it:

==> https://drone.rbkr.xyz/

What are we doing?

We will deploy a drone server which is responsible for the actual server and 2 drone agents which will receive instructions from the server whenever steps need to be executed. Steps run on agents.

Deploy the Servers

I’m using VULTR to deploy 3 nodes on coreos, 1 drone server and 2 drone agents as seen below:

image

Documentation: https://docs.drone.io/installation/github/multi-machine/ https://github.com/settings/developers

We will use Github for version control and to delegate auth, therefore we need to register a new application on Github.

Register New Application on Github at https://github.com/settings/developer :

register-application

Get your Drone-Server Host Endpoint, and update the fields:

image

You will receive a Github Client ID, Secret which we will need later, which will look like this:

1
2
3
4
Client ID:
xx
Client Secret:
yyy

Generate the shared secret which will be used on the server and agent:

1
2
$ openssl rand -hex 16
eb83xxe19a3497f597f53044250df6yy

Create the Startup Script for Drone Server, which will just be a docker container running in detached mode. Note that you should use your own domain at SERVER_HOST and if you want to issue an certificate automatically keep DRONE_TLS_AUTOCERT to true.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ cat > start_drone-server.sh << EOF
#!/usr/bin/env bash

set -ex

GITHUB_CLIENT_ID=xx
GITHUB_CLIENT_SECRET=yyy
SHARED_SECRET=eb83xxe19a3497f597f53044250df6yy
SERVER_HOST=drone.yourdomain.com
SERVER_PROTOCOL=https

docker run \
  --volume=/var/run/docker.sock:/var/run/docker.sock \
  --volume=/var/lib/drone:/data \
  --env=DRONE_GITHUB_SERVER=https://github.com \
  --env=DRONE_GITHUB_CLIENT_ID=${GITHUB_CLIENT_ID} \
  --env=DRONE_GITHUB_CLIENT_SECRET=${GITHUB_CLIENT_SECRET} \
  --env=DRONE_AGENTS_ENABLED=true \
  --env=DRONE_RPC_SECRET=${SHARED_SECRET} \
  --env=DRONE_SERVER_HOST=${SERVER_HOST} \
  --env=DRONE_SERVER_PROTO=${SERVER_PROTOCOL} \
  --env=DRONE_TLS_AUTOCERT=true \
  --publish=80:80 \
  --publish=443:443 \
  --restart=always \
  --detach=true \
  --name=drone \
  drone/drone:1
EOF

Create the startup script for the drone agent, note that this script needs to be placed on the agent nodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ cat > start_drone-agent.sh << EOF
#!/usr/bin/env bash

set -ex

SHARED_SECRET=eb83xxe19a3497f597f53044250df6yy
AGENT_SERVER_HOST=https://drone.yourdomain.com
SERVER_PROTOCOL=https

docker run \
  --volume=/var/run/docker.sock:/var/run/docker.sock \
  --env=DRONE_RPC_SERVER=${AGENT_SERVER_HOST} \
  --env=DRONE_RPC_SECRET=${SHARED_SECRET} \
  --env=DRONE_RUNNER_CAPACITY=2 \
  --env=DRONE_RUNNER_NAME=${HOSTNAME} \
  --restart=always \
  --detach=true \
  --name=drone-agent-02 \
  drone/agent:1
EOF

Logon to the server node and start the drone server:

1
$ bash start_drone-agent.sh

Login to the agent nodes and start the agents:

1
$ bash start_drone-agent.sh

The server should show that it’s listening on port 80 and 443:

1
2
3
$ docker ps
CONTAINER ID        IMAGE               COMMAND               CREATED             STATUS              PORTS                                      NAMES
8ea70fc7b967        drone/drone:1       "/bin/drone-server"   12 minutes ago      Up 12 minutes       0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   drone

Access Drone

Access your Drone instance on port 80 eg. http://drone.yourdomain.com you should be automatically redirected to port 443, which should direct you to a login page, which will look like this:

drone-authorize

Login with your github account and allow drone some time to sync your repositories:

image

Add drone config to your repository:

Clone this repository: https://github.com/ruanbekker/drone-ci-testing which will contain the .drone.yml config which drone gets its instructions from.

Select a repository to activate, (drone-ci-testing in this case) head over to settings:

image

Adding secret:

image

Add more secrets:

image

Your build list should be empty:

image

Trigger a Build

Edit any of the files in the clone repository and you should see your build running:

image

When your build has completed:

image

You can also find out where the step ran:

image

Run a couple of tests:

image

Get notified via slack:

image

Debugging

If your build fails, its most likely that you need the slack_webhook secret. You can remove the slack step which shouldhelp you get going with drone.

More on Drone

Have a look at this document for more examples or have a look at their documentation as well as their extensive list of plugins and their setup documentation to become familiar with their configuration.

Setup a Slack Webhook for Sending Messages From Applications

slack

Slack is amazing and I cant live without it.

We can also use custom webhook integrations to allow applications to notify us via slack in response of events.

What we will be doing

We will be configuring a custom slack webhook integration and test out the api to show you how easy it is to use it to inform us via slack, whenever something is happening.

Configuration

Head over to: - https://{your-team}.slack.com/apps/manage/custom-integrations

Select Incoming Webhooks:

Select Add Configuration:

Select the channel it should post to:

Select Add Incoming Webhook Integration.

Save the webhook url that will look like this:

1
https://hooks.slack.com/services/ABCDEFGHI/ZXCVBNMAS/AbCdEfGhJiKlOpRQwErTyUiO

You can then further configure the integration.

Sending Messages

1
curl -XPOST -d 'payload={"channel": "#system_events", "username": "My-WebhookBot", "text": "This is posted to #general and comes from a bot named <https://alert-system.com/alerts/1234|webhookbot> for details!", "icon_emoji": ":borat:"}' https://hooks.slack.com/services/xx/xx/xx

Will result in:

image

Message Attachment, Error:

1
curl -XPOST -d 'payload={"channel": "#system_events", "username": "My-WebhookBot", "text": "*Incoming Alert!*", "icon_emoji": ":borat:", "attachments":[{"fallback":"New open task [Urgent]: <http://url_to_task|Test out Slack message attachments>","pretext":"New open task [Urgent]: <http://url_to_task|Test out Slack message attachments>","color":"#D00000","fields":[{"title":"Notes","value":"This is much easier than I thought it would be.","short":false}]}]}}' https://hooks.slack.com/services/xx/xx/xx

Results in:

image

Message Attachment, OK:

1
curl -XPOST -d 'payload={"channel": "#system_events", "username": "My-WebhookBot", "text": "*Status Update:*", "icon_emoji": ":borat:", "attachments":[{"fallback":"New open task has been closed [OK]: <http://url_to_task|Test out Slack message attachments>","pretext":"Task has been closed [OK]: <http://url_to_task|Test out Slack message attachments>","color":"#28B463","fields":[{"title":"Notes","value":"The error has been resolved and the status is OK","short":false}]}]}}' https://hooks.slack.com/services/xx/xx/xx

Results in:

image

Join my Slack

If you want to join my slack workspace, use this invite link

Resources:

MongoDB Examples With Golang

While looking into working with mongodb using golang, I found it quite frustrating getting it up and running and decided to make a quick post about it.

What are we doing?

Examples using the golang driver for mongodb to connect, read, update and delete documents from mongodb.

Environment:

Provision a mongodb server in docker:

1
2
$ docker network create container-net
$ docker run -itd --name mongodb --network container-net -p 27017:27017 ruanbekker/mongodb

Drop into a golang environment using docker:

1
$ docker run -it golang:alpine sh

Get the dependencies:

1
$ apk add --no-cache git

Change to your project path:

1
2
$ mkdir $GOPATH/src/myapp
$ cd $GOPATH/src/myapp

Download the golang mongodb driver:

1
$ go get go.mongodb.org/mongo-driver

Connecting to MongoDB in Golang

First example will be to connect to your mongodb instance:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package main

import (
    "context"
    "fmt"
    "log"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/bson"
    "go.mongodb.org/mongo-driver/mongo/options"
)

type Person struct {
    Name string
    Age  int
    City string
}

func main() {
    clientOptions := options.Client().ApplyURI("mongodb://mongodb:27017")
    client, err := mongo.Connect(context.TODO(), clientOptions)

    if err != nil {
        log.Fatal(err)
    }

    err = client.Ping(context.TODO(), nil)

    if err != nil {
        log.Fatal(err)
    }

    fmt.Println("Connected to MongoDB!")

}

Running our app:

1
2
$ go run main.go
Connected to MongoDB!

Writing to MongoDB with Golang

Let’s insert a single document to MongoDB:

1
2
3
4
5
6
7
8
9
10
11
12
func main() {
    ..
    collection := client.Database("mydb").Collection("persons")

    ruan := Person{"Ruan", 34, "Cape Town"}

    insertResult, err := collection.InsertOne(context.TODO(), ruan)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("Inserted a Single Document: ", insertResult.InsertedID)
}

Running it will produce:

1
2
3
$ go run main.go
Connected to MongoDB!
Inserted a single document:  ObjectID("5cb717dcf597b4411252341f")

Writing more than one document:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func main() {
    ..
    collection := client.Database("mydb").Collection("persons")

    ruan := Person{"Ruan", 34, "Cape Town"}
    james := Person{"James", 32, "Nairobi"}
    frankie := Person{"Frankie", 31, "Nairobi"}

    trainers := []interface{}{james, frankie}

    insertManyResult, err := collection.InsertMany(context.TODO(), trainers)
    if err != nil {
      log.Fatal(err)
  }
    fmt.Println("Inserted multiple documents: ", insertManyResult.InsertedIDs)
}

This will output in:

1
2
$ go run main.go
Inserted Multiple Documents:  [ObjectID("5cb717dcf597b44112523420") ObjectID("5cb717dcf597b44112523421")]

Updating Documents in MongoDB using Golang

Updating Frankie’s age:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
func main() {
    ..
    filter := bson.D
    update := bson.D{
        {"$inc", bson.D{
            {"age", 1},
        }},
    }

    updateResult, err := collection.UpdateOne(context.TODO(), filter, update)
    if err != nil {
      log.Fatal(err)
  }
    fmt.Printf("Matched %v documents and updated %v documents.\n", updateResult.MatchedCount, updateResult.ModifiedCount)
}

Running that will update Frankie’s age:

1
2
$ go run main.go
Matched 1 documents and updated 1 documents.

Reading Data from MongoDB

Reading the data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
funct main() {
    ..
    filter := bson.D
    var result Trainer

  err = collection.FindOne(context.TODO(), filter).Decode(&result)
  if err != nil {
      log.Fatal(err)
  }

  fmt.Printf("Found a single document: %+v\n", result)

  findOptions := options.Find()
    findOptions.SetLimit(2)

}
1
2
$ go run main.go
Found a single document: {Name:Frankie Age:32 City:Nairobi}

Finding multiple documents and returning the cursor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func main() {
    ..
    var results []*Trainer
  cur, err := collection.Find(context.TODO(), bson.D, findOptions)
  if err != nil {
      log.Fatal(err)
  }

  for cur.Next(context.TODO()) {
      var elem Trainer
      err := cur.Decode(&elem)
      if err != nil {
          log.Fatal(err)
      }

      results = append(results, &elem)
  }

  if err := cur.Err(); err != nil {
      log.Fatal(err)
  }

  cur.Close(context.TODO())
    fmt.Printf("Found multiple documents (array of pointers): %+v\n", results)
}

Running the example:

1
2
$ go run main.go
Found multiple documents (array of pointers): [0xc0001215c0 0xc0001215f0]

Deleting Data from MongoDB:

Deleting our data and closing the connection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
func main(){
    ..
    deleteResult, err := collection.DeleteMany(context.TODO(), bson.D)
  if err != nil {
      log.Fatal(err)
  }

  fmt.Printf("Deleted %v documents in the trainers collection\n", deleteResult.DeletedCount)

  err = client.Disconnect(context.TODO())

  if err != nil {
      log.Fatal(err)
  } else {
      fmt.Println("Connection to MongoDB closed.")
  }
}

Running the example:

1
2
3
$ go run main.go
Deleted 3 documents in the trainers collection
Connection to MongoDB closed.

The code for this example can be found at github.com/ruanbekker/code-examples/mongodb/golang/examples.go

Resources:

SQL Inner Joins Examples With SQLite

sqlite

Overview

In this tutorial we will get started with sqlite and use inner joins to query data from multiple tables to answer specific use case needs.

Connecting to your Sqlite Database

Connecting to your database uses the argument to the target database. You can use additional flags to set the properties that you want to enable:

1
$ sqlite3 -header -column mydatabase.db

or you can specify the additional options to your config:

1
2
3
4
cat > ~/.sqliterc << EOF
.mode column
.headers on
EOF

Then connecting to your database:

1
2
3
4
5
$ sqlite3 mydatabase.db
-- Loading resources from /Users/ruan/.sqliterc
SQLite version 3.16.0 2016-11-04 19:09:39
Enter ".help" for usage hints.
sqlite>

Create the Tables

Create the users table:

1
2
3
4
sqlite> create table users (
   ...> id INT(20), name VARCHAR(20), surname VARCHAR(20), city VARCHAR(20),
   ...> age INT(2), credit_card_num VARCHAR(20), job_position VARCHAR(20)
   ...> );

Create the transactions table:

1
2
3
4
sqlite> create table transactions (
   ...> credit_card_num VARCHAR(20), transaction_id INT(20), shop_name VARCHAR(20),
   ...> product_name VARCHAR(20), spent_total DECIMAL(6,2), purchase_option VARCHAR(20)
   ...> );

You can view the tables using .tables:

1
2
sqlite> .tables
transactions  users

And view the schema of the tables using .schema <table-name>

1
2
3
4
5
sqlite> .schema users
CREATE TABLE users (
id INT(20), name VARCHAR(20), surname VARCHAR(20), city VARCHAR(20),
age INT(2), credit_card_num VARCHAR(20), job_position VARCHAR(20)
);

Write to Sqlite Database

Now we will populate data to our tables. Insert a record to our users table:

1
sqlite> insert into users values(1, 'ruan', 'bekker', 'cape town', 31, '2345-8970-6712-4352', 'devops');

Insert a record to our transactions table:

1
sqlite> insert into transactions values('2345-8970-6712-4352', 981623, 'spaza01', 'burger', 101.02, 'credit_card');

Read from the Sqlite Database

Read the data from the users table:

1
2
3
4
sqlite> select * from users;
id          name        surname     city        age         credit_card_num      job_position
----------  ----------  ----------  ----------  ----------  -------------------  ------------
1           ruan        bekker      cape town   31          2345-8970-6712-4352  devops

Read the data from the transactions table:

1
2
3
4
sqlite> select * from transactions;
credit_card_num      transaction_id  shop_name   product_name  spent_total  purchase_option
-------------------  --------------  ----------  ------------  -----------  ---------------
2345-8970-6712-4352  981623          spaza01     burger        101.02       credit_card

Inner Joins with Sqlite

This is where stuff gets interesting.

Let’s say you want to join data from 2 tables, in this example we have a table with our userdata and a table with transaction data.

Say we want to see the user’s name, transaction id, the product they purchased and the total amount spent, we will make use of inner joins.

Example looks like this:

1
2
3
4
5
6
7
8
sqlite> select a.name, b.transaction_id, b.product_name, b.spent_total
   ...> from users
   ...> as a inner join transactions
   ...> as b on a.credit_card_num = b.credit_card_num
   ...> where a.credit_card_num = '2345-8970-6712-4352';
name        transaction_id  product_name  spent_total
----------  --------------  ------------  -----------
ruan        981623          burger         101.02

Let’s say you dont know the credit_card number but you would like to do a lookup the credit card number via the user’s id, then pass the value to the where statement:

1
2
3
4
5
6
7
8
sqlite> select a.name, b.transaction_id, b.product_name, b.spent_total
   ...> from users
   ...> as a inner join transactions
   ...> as b on a.credit_card_num = b.credit_card_num
   ...> where a.credit_card_num = (select credit_card_num from users where id = 1);
name        transaction_id  product_name  spent_total
----------  --------------  ------------  -----------
ruan        981623          burger         101.02

Let’s create another table called products:

1
2
3
4
sqlite> create table products (
   ...> product_id INTEGER(18), product_name VARCHAR(20),
   ...> product_category VARCHAR(20), product_price DECIMAL(6,2)
   ...> );

Write a record with product data to the table:

1
sqlite> insert into products values(0231, 'burger', 'fast foods', 101.02);

Now, lets say the question will be that we need to display the users name, credit card number, product name as well as the product category and products price, by only giving you the credit card number

1
2
3
4
5
6
7
8
9
sqlite> select a.name, b.credit_card_num, c.product_name, c.product_category, c.product_price
   ...> from users
   ...> as a inner join transactions
   ...> as b on a.credit_card_num = b.credit_card_num inner join products
   ...> as c on b.product_name = c.product_name
   ...> where a.credit_card_num = '2345-8970-6712-4352' and c.product_name = 'burger';
name        credit_card_num      product_name  product_category   product_price
----------  -------------------  ------------  -----------------  -------------
ruan        2345-8970-6712-4352  burger        fast foods         101.02

Install Elasticsearch With Ansible Tutorial

In this tutorial we will install a elasticsearch cluster with ansible (well rather a node)

Our inventory:

1
2
3
4
5
6
$ cat inventory.ini
[newes]
esnewnode

[newes:vars]
ansible_python_interpreter=/usr/bin/python3

Our playbook to bootstrap our nodes with Python:

1
2
3
4
5
6
7
8
$ cat bootstrap-python.yml
---
- hosts: newes
  gather_facts: False

  tasks:
  - name: install python
    raw: test -e /usr/bin/python || ( apt update && apt install python -y )

Our playbook to provision users:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ cat provision-users.yml
---
# Provisions User on Nodes
# Setup Passwordless SSH from Jumpbox
# Install Packages using APT
- name: bootstrap python
  hosts: newes
  roles:
    - bootstrap-python

- name: setup pre-requisites
  hosts: newes
  roles:
    - create-sudo-user
    - install-modules
    - configure-hosts-file

#- name: setup ruan user on the nodes
#  become: yes
#  become_user: ruan
#  hosts: admin
#  roles:
#    - configure-admin

- name: copy public key to nodes
  become: yes
  become_user: ruan
  hosts: newes
  roles:
    - copy-keys

- name: install elasticsearch
  hosts: newes
  roles:
    - elasticsearch

Our roles that will be included in our playbooks from above:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cat roles/create-sudo-user/tasks/main.yml
---
- name: Create Sudo User
  user: name=ruan
        groups=sudo
        shell=/bin/bash
        generate_ssh_key=no
        state=present

- name: Set Passwordless SSH Access for ruan user
  copy: src=sudoers
        dest=/etc/sudoers.d
        mode=0440

Sudoers file for the create sudo role:

1
2
$ cat roles/create-sudo-user/files/sudoers
ruan ALL=(ALL) NOPASSWD:ALL

The role to install all the apt packages:

1
2
3
4
5
6
7
8
9
10
11
12
$ cat roles/install-modules/tasks/main.yml
---
- name: Install Packages
  apt: name= state=latest update_cache=yes
  with_items:
    - apt-transport-https
    - ntp
    - python
    - tcpdump
    - wget
    - openssl
    - curl

Role to configure hosts file:

1
2
3
4
5
6
$ cat roles/configure-hosts-file/tasks/main.yml
---
- name: Configure Hosts File
  lineinfile: path=/etc/hosts regexp='.*$' line=" " state=present
  when: hostvars[item].ansible_default_ipv4.address is defined
  with_items: ""

The role to copy the ssh keys:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cat roles/copy-keys/tasks/main.yml
---
- name: Copy Public Key to Other Hosts
  become: true
  become_user: ruan
  copy:
    src: /tmp/id_rsa.pub
    dest: /tmp/id_rsa.pub
    mode: 0644
- name: Append Public key in authorized_keys file
  authorized_key:
    user: ruan
    state: present
    key: ""

The role to install elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ cat roles/elasticsearch/tasks/main.yml
---
- name: get apt repo key
  apt_key:
    url: https://artifacts.elastic.co/GPG-KEY-elasticsearch
    state: present

- name: install apt repo
  apt_repository:
    repo: deb https://artifacts.elastic.co/packages/6.x/apt stable main
    state: present
    filename: elastic-6.x.list
    update_cache: yes

- name: install java
  apt:
    name: openjdk-8-jre
    state: present
    update_cache: yes

- name: install elasticsearch
  apt:
    name: elasticsearch
    state: present
    update_cache: yes

- name: reload systemd config
  systemd: daemon_reload=yes

- name: enable service elasticsearch and ensure it is not masked
  systemd:
    name: elasticsearch
    enabled: yes
    masked: no

- name: ensure elasticsearch is running
  systemd: state=started name=elasticsearch

Deploy Elasticsearch

Bootstrap python then deploy elasticsearch:

1
2
$ ansible-playbook -i inventory.ini -u root bootstrap-python.yml
$ ansible-playbook -i inventory.ini -u root provision-users.yml

Test out elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ curl http://127.0.0.1:9200/
{
  "name" : "Z52AEZ7",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "fUiYVjsSQpCbo9QKEiuvaA",
  "version" : {
    "number" : "6.3.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "424e937",
    "build_date" : "2018-06-11T23:38:03.357887Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Elasticsearch Templates Tutorial

elasticsearch

Elasticsearch Index templates allow you to define templates that will automatically be applied on index creation time. The templates can include both settings and mappings..

What are we doing?

We want to create a template on how we would a target index to look like. It should consist of 1 primary shard and 2 replica shards and we want to update the mapping that we can make use of text and keyword string fields.

So then whenever we create an index which matches our template, the template will be applied on index creation.

String Fields

We will make use of the following string fields in our mappings which will be included in our templates:

Text:

A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed. The analysis process allows Elasticsearch to search for individual words within each full text field

Keyword":

A field to index structured content such as email addresses, hostnames, status codes, zip codes or tags.

They are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value

Note about templates:

Couple of things to keep in mind:

1
2
1. Templates gets referenced on index creation and does not affect existing indexes
2. When you update a template, you need to specify the exact template, the payload overwrites the whole template

View your current templates in your cluster:

1
2
3
4
$ curl -XGET http://localhost:9200/_cat/templates?v
name                          index_patterns             order      version
.monitoring-kibana            [.monitoring-kibana-6-*]   0          6020099
filebeat-6.3.1                [filebeat-6.3.1-*]         1

Create the template foobar_docs which will match any indexes matching foo-* and bar-* which will inherit index settings of 1 primary shards and 2 replica shards and also apply a mapping template shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_template/foobar_docs -d '
{
  "index_patterns": [
    "foo-*", "bar-*"
  ], 
  "settings": {
    "number_of_shards": 1, 
    "number_of_replicas": 2
  }, 
  "mappings": {
    "type1": {
      "_source": {"enabled": true}, 
      "properties": {"created_at": {"type": "date"}, 
      "title": {"type": "text"}, 
      "status": {"type": "keyword"}, 
      "content": {"type":"text"}, 
      "first_name": {"type": "keyword"}, 
      "last_name": {"type": "keyword"}, 
      "age": {"type":"integer"}, 
      "registered": {"type": "boolean"}
      }
    }
  }
}'
{"acknowledged":true}

View the template from the api:

1
2
3
$ curl -XGET http://localhost:9200/_cat/templates/foobar_docs?v
name        index_patterns order version
foobar_docs [foo-*, bar-*] 0

Create a index that will match the templates definition:

1
2
$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/test-2018.07.20
{"acknowledged":true,"shards_acknowledged":true,"index":"test-2018.07.20"}

Verify that the index has been created:

1
2
3
$ curl -XGET http://localhost:9200/_cat/indices/test-2018.07.20?v
health status index           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test-2018.07.20 -5XOfl0GTEGeHycTwL51vQ   5   1          0            0        2kb          1.1kb

We can also inspect the template like shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
$ curl -XGET http://localhost:9200/_template/foobar_docs?pretty
{
  "foobar_docs" : {
    "order" : 0,
    "index_patterns" : [
      "foo-*",
      "bar-*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "number_of_replicas" : "2"
      }
    },
    "mappings" : {
      "type1" : {
        "_source" : {
          "enabled" : true
        },
        "properties" : {
          "created_at" : {
            "type" : "date"
          },
          "title" : {
            "type" : "text"
          },
          "status" : {
            "type" : "keyword"
          },
          "content" : {
            "type" : "text"
          },
          "first_name" : {
            "type" : "keyword"
          },
          "last_name" : {
            "type" : "keyword"
          },
          "age" : {
            "type" : "integer"
          },
          "registered" : {
            "type" : "boolean"
          }
        }
      }
    },
    "aliases" : { }
  }
}

Ingest a document to your index:

1
2
3
4
5
6
7
8
9
10
$ curl -H 'Content-Type: application/json' -XPOST http://localhost:9200/foo-2018.07.20/type1/ -d '
{
  "title": "this is a post", 
  "status": "active", 
  "content": "introduction post", 
  "first_name": "ruan", 
  "last_name": "bekker", 
  "age": "31", 
  "registered": "true"
}'

Run a search against your elasticsearch index to view the data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ curl -XGET http://localhost:9200/foo-2018.07.20/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "foo-2018.07.20",
        "_type" : "type1",
        "_id" : "ZYfotmQB9mQGWzJT7W2y",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is a post",
          "status" : "active",
          "content" : "introduction post",
          "first_name" : "ruan",
          "last_name" : "bekker",
          "age" : "31",
          "registered" : "true"
        }
      }
    ]
  }
}

Create another document:

1
2
3
4
5
6
7
8
9
10
11
$ curl -H 'Content-Type: application/json' -XPOST http://localhost:9200/foo-2018.07.20/type1/ -d '
{
  "created_at": 1532077144, 
  "title": "this is a another post", 
  "status": "ae", 
  "content": "introduction post", 
  "first_name": "stefan", 
  "last_name": "bester", 
  "age": 34, 
  "registered": "true"
}'

As you guessed, executing another search against elasticsearch shows us both documents:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
$ curl -XGET http://localhost:9200/foo-2018.07.20/_search?pretty
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "foo-2018.07.20",
        "_type" : "type1",
        "_id" : "ZYfotmQB9mQGWzJT7W2y",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is a post",
          "status" : "active",
          "content" : "introduction post",
          "first_name" : "ruan",
          "last_name" : "bekker",
          "age" : "31",
          "registered" : "true"
        }
      },
      {
        "_index" : "foo-2018.07.20",
        "_type" : "type1",
        "_id" : "rofrtmQB9mQGWzJTxnvp",
        "_score" : 1.0,
        "_source" : {
          "created_at" : 1532077144,
          "title" : "this is a another post",
          "status" : "active",
          "content" : "introduction post",
          "first_name" : "stefan",
          "last_name" : "bester",
          "age" : 34,
          "registered" : "true"
        }
      }
    ]
  }
}

Let’s run a search query for any documents matching people with the age between 30 and 40:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
$ curl -H 'Content-Type: application/json' -XGET http://localhost:9200/foo-2018.07.20/_search?pretty -d '{"query": {"range": {"age": {"gte": 30, "lte": 40}}}}'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "foo-2018.07.20",
        "_type" : "type1",
        "_id" : "ZYfotmQB9mQGWzJT7W2y",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is a post",
          "status" : "active",
          "content" : "introduction post",
          "first_name" : "ruan",
          "last_name" : "bekker",
          "age" : "31",
          "registered" : "true"
        }
      },
      {
        "_index" : "foo-2018.07.20",
        "_type" : "type1",
        "_id" : "rofrtmQB9mQGWzJTxnvp",
        "_score" : 1.0,
        "_source" : {
          "created_at" : 1532077144,
          "title" : "this is a another post",
          "status" : "active",
          "content" : "introduction post",
          "first_name" : "stefan",
          "last_name" : "bester",
          "age" : 34,
          "registered" : "true"
        }
      }
    ]
  }
}

Search for people with the age between 32 and 40:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ curl -H 'Content-Type: application/json' -XGET http://localhost:9200/foo-2018.07.20/_search?pretty -d '{"query": {"range": {"age": {"gte": 32, "lte": 40}}}}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "foo-2018.07.20",
        "_type" : "type1",
        "_id" : "rofrtmQB9mQGWzJTxnvp",
        "_score" : 1.0,
        "_source" : {
          "created_at" : 1532077144,
          "title" : "this is a another post",
          "status" : "active",
          "content" : "introduction post",
          "first_name" : "stefan",
          "last_name" : "bester",
          "age" : 34,
          "registered" : "true"
        }
      }
    ]
  }
}

Let’s say we want to update our template with refresh_interval, primary shards of 2 and replicas of 1 settings:

1
2
3
4
5
$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_template/foobar_docs -d '
{
  "index_patterns": ["foo-*", "bar-*"], 
  "settings": {"number_of_shards": 2, "number_of_replicas": 1, "refresh_interval": "15s"}
}'

View the template, as you can see the target template will look exactly like the data body that we are posting to the template api:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ curl -XGET http://localhost:9200/_template/foobar_docs?pretty
{
  "foobar_docs" : {
    "order" : 0,
    "index_patterns" : [
      "foo-*",
      "bar-*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "refresh_interval" : "15s"
      }
    },
    "mappings" : { },
    "aliases" : { }
  }
}

View our current index, as you can see the index is unaffected of the template change as only new indexes will retrieve the update of the template:

1
2
3
$ curl -XGET http://localhost:9200/_cat/indices/foo-2018.07.20?v
health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   foo-2018.07.20 ol1pGugrQCKd0xES4R6oFg   1   2          2            0     20.4kb         10.2kb

Create a new index to verify that the template’s config is pulled into the new index:

1
$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/foo-2018.07.20-new

View the elasticsearch indexes to verify the behavior:

1
2
3
4
$ curl -XGET http://localhost:9200/_cat/indices/foo-2018.07.*?v
health status index              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   foo-2018.07.20     ol1pGugrQCKd0xES4R6oFg   1   2          2            0     20.4kb         10.2kb
green  open   foo-2018.07.20-new g6Ii8jtKRFa1zDVB2IsDBQ   2   1          0            0       920b           460b

Delete the indexes:

1
2
$ curl -XDELETE http://localhost:9200/foo-*
{"acknowledged":true}

Delete the templates:

1
2
$ curl -XDELETE 'http://localhost:9200/_template/foobar_docs'
{"acknowledged":true}

Verify that the templates are gone:

1
2
$ curl -XGET http://localhost:9200/_cat/templates/foobar_docs?v
name index_patterns order version

Resources:

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html https://www.elastic.co/guide/en/elasticsearch/reference/6.3/mapping-types.html https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

Reindex Your Elasticsearch Indexes Tutorial

At times you may find that the indexes in your cluster are not queried that often but you still want them around. But you also want to reduce the resource footprint by reducing the number of shards, and perhaps increase the refresh interval.

For refresh interval, if new data comes in and we dont care to have it available near real time, we can set the refresh interval for example to 60 seconds, so the index will only have the data available every 60 seconds. (default: 1s)

Reindexing Elasticsearch Indexes

In this example we will use the scenario where we have daily indexes with 5 primary shards and 1 set of replicas and we would like to create a weekly index with 1 primary shard, 1 replica and the refresh interval of 60 seconds, and reindex the previous weeks data into our weekly index.

Create the target weekly index with the mentioned configuration:

1
2
3
4
5
6
7
8
9
$ curl -H "Content-Type: application/json" -XPUT 'http://127.0.0.1:9200/my-index-2019.01.01-07' -d '
{
  "settings": {
      "number_of_shards": "1",
      "number_of_replicas": "1",
      "refresh_interval" : "60s"
  }
}
'

Ensure the index exist:

1
2
3
4
5
6
$ curl -s -XGET 'http://127.0.0.1:9200/_cat/indices/my-index-2019.01.01*?v'
health status index                    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   my-index-2019.01.01      wbFEJCApSpSlbOXzb1Tjxw   5   1      22007            0      6.6mb          3.2mb
green  open   my-index-2019.01.02      cbDmJR7pbpRT3O2x46fj20   5   1      28031            0      7.2mb          3.4mb
..
green  open   my-index-2019.01.01-07   mJR7pJ9O4T3O9jzyI943ca   1   1          0            0       466b           233b

Create the reindex job, specify the source indexes and the destination index where the data must be reindexed to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ curl -s -H 'Content-Type: application/json' -XPOST 'http://127.0.0.1:9200/_reindex' -d '
{
  "source": {
      "index": [
          "my-index-2019.01.01",
          "my-index-2019.01.02",
          "my-index-2019.01.03",
          "my-index-2019.01.04",
          "my-index-2019.01.05",
          "my-index-2019.01.06",
          "my-index-2019.01.07"
      ]
  },
  "dest": {
      "index": "my-index-2019.01.01-07"
  }
}
'

You can use the tasks api to monitor the progress:

1
2
3
$ curl -s -XGET 'http://127.0.0.1:9200/_cat/tasks?'
indices:data/write/bulk        -3MIFskURPKxd1tg8P2j0w:912621270 -                                transport 1538459598188 22:53:18 3.1ms       x.x.x.x -3MIFsk
indices:data/write/bulk[s]     -3MIFskURPKxd1tg8P2j0w:912621271 -3MIFskURPKxd1tg8P2j0w:816648230 transport 1538459598188 22:53:18 3.1ms       x.x.x.x -3MIFsk

You manipulate the output of the tasks api by either fetching specific actions:

1
$ curl -s -XGET 'http://127.0.0.1:9200/_tasks?actions=*data/write/reindex&detailed&pretty'

Or viewing detailed output:

1
2
$ curl -s -XGET 'http://127.0.0.1:9200/_cat/tasks?detailed' | grep 'indices:data/write/reindex'
indices:data/write/reindex     IvoqWoUqSgGCQ0ELG21nhg:740560815 -                                transport 1538462294714 23:38:14 1.7m        x.x.x.x IvoqWoU reindex from [my-index-2019.01.01] to [my-index-2019.01.01-07]

Or you could get the json response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ curl -s -XGET 'http://127.0.0.1:9200/_tasks?actions=*data/write/reindex&detailed&pretty'
{
  "nodes" : {
    "xx" : {
      "name" : "xx",
      "roles" : [ "data", "ingest" ],
      "tasks" : {
        "xx:876452606" : {
          "node" : "xx",
          "id" : 776452606,
          "type" : "transport",
          "action" : "indices:data/write/reindex",
          "status" : {
            "total" : 4785475,
            "updated" : 0,
            "created" : 234000,
            "deleted" : 0,
            "batches" : 235,
            "version_conflicts" : 0,
            "noops" : 0,
            "retries" : {
              "bulk" : 0,
              "search" : 0
            },
            "throttled_millis" : 0,
            "requests_per_second" : -1.0,
            "throttled_until_millis" : 0
          },
          "description" : "reindex from [my-index-2019.01.07] to [my-index-2019.01.01-07]",
          "start_time_in_millis" : 1538462901120,
          "running_time_in_nanos" : 64654161339,
          "cancellable" : true
        }
      }
    }
  }
}

Anyways, moving along. Reindex jobs will always be listed as a data/write/reindex action, so we can count the output:

1
$ curl -s -XGET 'http://127.0.0.1:9200/_cat/tasks?'  | grep 'data/write/reindex' | wc -l

If the response is 0 then all the tasks completed and we can have a look at our index again:

1
2
3
4
5
6
$ curl -s -XGET 'http://127.0.0.1:9200/_cat/indices/my-index-2019.01.0*?v'
health status index                    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   my-index-2019.01.01      wbFEJCApSpSlbOXzb1Tjxw   5   1      22007            0      6.6mb          3.2mb
green  open   my-index-2019.01.02      cbDmJR7pbpRT3O2x46fj20   5   1      28031            0      7.2mb          3.4mb
..
green  open   my-index-2019.01.01-07   mJR7pJ9O4T3O9jzyI943ca   1   1     322007            0     45.9mb         22.9mb

Now that we can verify that the reindex tasks finished and we can see the aggregated result in our target index, we can delete our source indexes:

1
$ curl -XDELETE 'http://127.0.0.1:9200/my-index-2019.01.01,my-index-2019.01.02,my-index-2019.01.03,my-index-2019.01.04,my-index-2019.01.05,my-index-2019.01.06,my-index-2019.01.07'

Shrink Your Elasticsearch Index by Reducing the Shard Count With the Shards API

elasticsearch

Resize your Elasticsearch Index with fewer Primary Shards by using the Shrink API.

In Elasticsearch, every index consists of multiple shards and every shard in your elasticsearch cluster contributes to the usage of your cpu, memory, file descriptors etc. This definitely helps for performance in parallel processing. As for an example with time series data, you would write and read a lot to an index with ie the current date.

If that index drops in requests and only read from the index every now and then, we dont need that many shards anymore and if we have multiple indexes, they may build up and take up unessacary compute power.

For a scenario where we want to reduce the size of our indexes, we can use the Shrink API to reduce the number of primary shards.

The Shrink API

The shrink index API allows you to shrink an existing index into a new index with fewer primary shards. The requested number of primary shards in the target index must be a factor of the number of shards in the source index. For example an index with 8 primary shards can be shrunk into 4, 2 or 1 primary shards or an index with 15 primary shards can be shrunk into 5, 3 or 1. If the number of shards in the index is a prime number it can only be shrunk into a single primary shard. Before shrinking, a (primary or replica) copy of every shard in the index must be present on the same node.

Steps on Shrinking:

Create the target index with the same definition as the source index, but with a smaller number of primary shards. Then it hard-links segments from the source index into the target index. Finally, it recovers the target index as though it were a closed index which had just been re-opened.

Reduce the Primary Shards of an Index.

As you may know, you can only set the Primary Shards on Index Creation time and Replica Shards you can set on the fly.

In this example we have a source index: my-index-2019.01.10 with 5 primary shards and 1 replica shard, which gives us 10 shards for that index, that we would like to shrink to an index named: archive_my-index-2019.01.10 with 1 primary shard and 1 replica shard, which will give us 2 shards for that index.

Have a look at your index:

1
2
3
4
5
$ curl -XGET "http://127.0.0.1:9200/_cat/indices/my-index-2019.01.*?v"
health status index                                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   my-index-2019.01.10                       xAijUTSevXirdyTZTN3cuA   5   1   80795533            0      5.9gb          2.9gb
green  open   my-index-2019.01.11                       yb8Cjy9eQwqde8mJhR_vlw   5   5   80590481            0      5.7gb          2.8gb
...

And have a look at the nodes, as we will relocate the shards to a specific node:

1
2
3
4
$ curl http://127.0.0.1:9200/_cat/nodes?v
ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
x.x.x.x             8          98   0    0.04    0.03     0.01 m         -      3E9yp60
x.x.x.x            65          99   4    0.43    0.23     0.36 di        -      znFrs18

In this demonstration we only have 2 nodes with a replication factor of 1, which means a index’s shards will always reside on both nodes. In a case with more nodes, we need to ensure that we choose a node where a primary index reside on.

Look at the shards api, by passing the index name to get the index to shard allocation:

1
2
3
4
5
6
7
8
9
10
11
12
$ curl http://127.0.0.1:9200/_cat/shards/my-index-2019.01.10?v'
index               shard prirep state   docs  store ip       node
my-index-2019.01.10 2     p      STARTED  193  101mb x.x.x.x  Lq9P7eP
my-index-2019.01.10 2     r      STARTED  193  101mb x.x.x.x  F5edOwK
my-index-2019.01.10 4     p      STARTED  197  101mb x.x.x.x  Lq9P7eP
my-index-2019.01.10 4     r      STARTED  197  101mb x.x.x.x  F5edOwK
my-index-2019.01.10 3     r      STARTED  184  101mb x.x.x.x  Lq9P7eP
my-index-2019.01.10 3     p      STARTED  184  101mb x.x.x.x  F5edOwK
my-index-2019.01.10 1     r      STARTED  180  101mb x.x.x.x  Lq9P7eP
my-index-2019.01.10 1     p      STARTED  180  101mb x.x.x.x  F5edOwK
my-index-2019.01.10 0     p      STARTED  187  101mb x.x.x.x  Lq9P7eP
my-index-2019.01.10 0     r      STARTED  187  101mb x.x.x.x  F5edOwK

Create the target index:

1
2
3
4
5
6
7
8
$ curl -XPUT -H 'Content-Type: application/json' http://127.0.0.1:9200/archive_my-index-2019.01.10 -d '
{
  "settings": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
  }
}
'

Set the index as read only and relocate every copy of shard to node we indentified in a previous step:

1
2
3
4
5
6
7
8
$ curl -XPUT -H 'Content-Type: application/json' http://127.0.0.1:9200/my-index-2019.01.10/_settings -d '
{
  "settings": {
      "index.routing.allocation.require._name": "Lq9P7eP",
      "index.blocks.write": true
  }
}
'

Now shrink the source index (my-index-2019.01.10) to the target index (archive_my-index-2019.01.10):

1
$ curl -XPOST -H 'Content-Type: application/json' http://127.0.0.1:9200/my-index-2019.01.10/_shrink/archive_my-index-2019.01.10

You can monitor the progress by using the Recovery API:

1
2
3
4
5
6
$ curl -s -XGET "http://127.0.0.1:9200/_cat/recovery/my-index-2019.01.10?human&detailed=true"
my-index-2019.01.10 0 23.3s peer done x.x.x.x  F5edOwK x.x.x.x Lq9P7eP n/a n/a 15 15 100.0% 15 635836677 635836677 100.0% 635836677 0 0 100.0%
my-index-2019.01.10 1 22s   peer done x.x.x.x  Lq9P7eP x.x.x.x Lq9P7eP n/a n/a 15 15 100.0% 15 636392649 636392649 100.0% 636392649 0 0 100.0%
my-index-2019.01.10 2 19.6s peer done x.x.x.x  F5edOwK x.x.x.x Lq9P7eP n/a n/a 15 15 100.0% 15 636809671 636809671 100.0% 636809671 0 0 100.0%
my-index-2019.01.10 3 21.5s peer done x.x.x.x  Lq9P7eP x.x.x.x Lq9P7eP n/a n/a 15 15 100.0% 15 636378870 636378870 100.0% 636378870 0 0 100.0%
my-index-2019.01.10 4 23.3s peer done x.x.x.x F5edOwK- x.x.x.x Lq9P7eP n/a n/a 15 15 100.0% 15 636545756 636545756 100.0% 636545756 0 0 100.0%

You can also pass aliases as your table columns for output:

1
2
3
4
$ curl -s -XGET "http://127.0.0.1:9200/_cat/recovery/my-index-2019.01.10?v&detailed=true&h=index,shard,time,ty,st,shost,thost,f,fp,b,bp"
index                            shard time  ty   st   shost         thost        f  fp     b         bp
my-index-2019.01.10              0     23.3s peer done x.x.x.x x.x.x.x 15 100.0% 635836677 100.0%
...

When the job is done, have a look at your indexes:

1
2
3
4
$ curl -XGET "http://127.0.0.1:9200/_cat/indices/*my-index-2019.01.10?v"
health status index                                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   archive_my-index-2019.01.10               PAijUTSeRvirdyTZTN3cuA   1   1   80795533            0      5.9gb          2.9gb
green  open   my-index-2019.01.10                       Cb8Cjy9CQwqde8mJhR_vlw   5   1   80795533            0      2.9gb          2.9gb

Remove the block on your old index in order to make it writable:

1
2
3
4
5
6
7
8
$ curl -XPUT -H 'Content-Type: application/json' http://127.0.0.1:9200/my-index-2019.01.10/_settings" -d '
{
  "settings": {
      "index.routing.allocation.require._name": null,
      "index.blocks.write": null
  }
}
'

Delete the old index:

1
$ curl -XDELETE -H 'Content-Type: application/json' http://127.0.0.1:9200/my-index-2019.01.10

Note:, On AWS Elasticsearch Service, if you dont remove the block and you trigger a redeployment, you will end up with something like this. Shard may still be constraint to a node.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ curl -s -XGET ${ES_HOST/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host          ip  node
     0           0b    51.2gb    956.5gb   1007.8gb            5 x.x.x.x  x.x.x.x  ap9Mx1R
     1        3.6gb    54.9gb    952.8gb   1007.8gb            5 x.x.x.x  x.x.x.x  PqmoQpN   <-----------
     0           0b    51.2gb    956.5gb   1007.8gb            5 x.x.x.x  x.x.x.x  5p7x4Lc
     0           0b    51.2gb    956.5gb   1007.8gb            5 x.x.x.x  x.x.x.x  c8kniP3
     0           0b    51.2gb    956.5gb   1007.8gb            5 x.x.x.x  x.x.x.x  jPwlwsD
     0           0b    51.2gb    956.5gb   1007.8gb            5 x.x.x.x  x.x.x.x  ljos4mu
   481      904.1gb   990.3gb    521.3gb      1.4tb           65 x.x.x.x  x.x.x.x  qAF-gIU
   481      820.2gb   903.6gb    608.1gb      1.4tb           59 x.x.x.x  x.x.x.x  dR3sNwA
   481      824.6gb   909.1gb    602.6gb      1.4tb           60 x.x.x.x  x.x.x.x  fvL4A9X
   481      792.7gb   876.5gb    635.2gb      1.4tb           57 x.x.x.x  x.x.x.x  lk4svht
   481      779.2gb   864.4gb    647.3gb      1.4tb           57 x.x.x.x  x.x.x.x  uLsej9m
     0           0b    51.2gb    956.5gb   1007.8gb            5 x.x.x.x  x.x.x.x  yM4Ka9l

Resources: