Ruan Bekker's Blog

From a Curious mind to Posts on Github

How to Use the MySQL Terraform Provider

In this tutorial we will provision a MySQL Server with Docker and then use Terraform to provision MySQL Users, Database Schemas and MySQL Grants with the MySQL Terraform Provider.

About

Terraform is super powerful and can do a lot of things. And it shines when it provisions Infrastructure. So in a scenario where we use Terraform to provision RDS MySQL Database Instances, we might still want to provision extra MySQL Users, or Database Schemas and the respective MySQL Grants.

Usually you will logon to the database and create them manually with sql syntax. But in this tutorial we want to make use of Docker to provision our MySQL Server and we would like to make use of Terraform to provision the MySQL Database Schemas, Grants and Users.

Instead of using AWS RDS, I will be provisioning a MySQL Server on Docker so that we can keep the costs free, for those who are following along.

We will also go through the steps on how to rotate the database password that we will be provisioning for our user.

MySQL Server

First we will provision a MySQL Server on Docker Containers, I have a docker-compose.yaml which is available in my quick-starts github repository:

1
2
3
4
5
6
7
8
9
10
11
version: "3.8"

services:
  mysql:
    image: mysql:8.0
    container_name: mysql
    ports:
      - 3306:3306
    environment:
      - MYSQL_DATABASE=sample
      - MYSQL_ROOT_PASSWORD=rootpassword

Once you have saved that in your current working directory, you can start the container with docker compose:

1
docker-compose up -d

You can test the mysql container by logging onto the mysql server with the correct auth:

1
docker exec -it mysql mysql -u root -prootpassword -e 'show databases;'

This should be more or less the output:

1
2
3
4
5
6
7
8
9
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sample             |
| sys                |
+--------------------+

Terraform

If you don’t have Terraform installed, you can install it from their documentation.

If you want the source code of this example, its available in my terraform-mysql/petoju-provider repository. Which you can clone and jump into the terraform/mysql/petoju-provider directory.

First we will define the providers.tf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
terraform {
  required_providers {
    mysql = {
      source = "petoju/mysql"
      version = "3.0.37"
    }
  }
}

provider "mysql" {
  alias    = "local"
  endpoint = "127.0.0.1:3306"
  username = "root"
  password = "rootpassword"
}

Then the main.tf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
resource "random_password" "user_password" {
  length           = 24
  special          = true
  min_special      = 2
  override_special = "!#$%^&*()-_=+[]{}<>:?"
  keepers = {
    password_version = var.password_version
  }
}

resource "mysql_database" "user_db" {
  provider = mysql.local
  name = var.database_name
}

resource "mysql_user" "user_id" {
  provider = mysql.local
  user = var.database_username
  plaintext_password = random_password.user_password.result
  host = "%"
  tls_option = "NONE"
}

resource "mysql_grant" "user_id" {
  provider = mysql.local
  user = var.database_username
  host = "%"
  database = var.database_name
  privileges = ["SELECT", "UPDATE"]
  depends_on = [
    mysql_user.user_id
  ]
}

Then the variables.tf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
variable "database_name" {
  description = "The name of the database that you want created."
  type        = string
  default     = null
}

variable "database_username" {
  description = "The name of the database username that you want created."
  type        = string
  default     = null
}

variable "password_version" {
  description = "The password rotates when this value gets updated."
  type        = number
  default     = 0
}

Then our outputs.tf:

1
2
3
4
5
6
7
8
output "user" {
  value = mysql_user.user_id.user
}

output "password" {
  sensitive = true
  value = random_password.user_password.result
}

Our terraform.tfvars that defines the values of our variables:

1
2
3
database_name     = "foobar"
database_username = "ruanb"
password_version  = 0

Now we are ready to run our terraform code, which will ultimately create a database, user and grants. Outputs the encrypted string of your password which was encrypted with your keybase_username.

Initialise Terraform:

1
terraform init

Run the plan to see what terraform wants to provision:

1
terraform plan

And we can see the following resources will be created:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # mysql_database.user_db will be created
  + resource "mysql_database" "user_db" {
      + default_character_set = "utf8mb4"
      + default_collation     = "utf8mb4_general_ci"
      + id                    = (known after apply)
      + name                  = "foobar"
    }

  # mysql_grant.user_id will be created
  + resource "mysql_grant" "user_id" {
      + database   = "foobar"
      + grant      = false
      + host       = "%"
      + id         = (known after apply)
      + privileges = [
          + "SELECT",
          + "UPDATE",
        ]
      + table      = "*"
      + tls_option = "NONE"
      + user       = "ruanb"
    }

  # mysql_user.user_id will be created
  + resource "mysql_user" "user_id" {
      + host               = "%"
      + id                 = (known after apply)
      + plaintext_password = (sensitive value)
      + tls_option         = "NONE"
      + user               = "ruanb"
    }

  # random_password.user_password will be created
  + resource "random_password" "user_password" {
      + bcrypt_hash      = (sensitive value)
      + id               = (known after apply)
      + keepers          = {
          + "password_version" = "0"
        }
      + length           = 24
      + lower            = true
      + min_lower        = 0
      + min_numeric      = 0
      + min_special      = 2
      + min_upper        = 0
      + number           = true
      + numeric          = true
      + override_special = "!#$%^&*()-_=+[]{}<>:?"
      + result           = (sensitive value)
      + special          = true
      + upper            = true
    }

Plan: 4 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + password = (sensitive value)
  + user     = "ruanb"

Run the apply which will create the database, the user, sets the password and applies the grants:

1
terraform apply

Then our returned output should show something like this:

1
2
3
4
5
6
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

password = <sensitive>
user = "ruanb"

As our password is set as sensitive, we can access the value with terraform output -raw password, let’s assign the password to a variable:

1
DBPASS=$(terraform output -raw password)

Then we can exec into the mysql container and logon to the mysql server with our new credentials:

1
docker exec -it mysql mysql -u ruanb -p$DBPASS

And we can see we are logged onto the mysql server:

1
2
3
4
5
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 8.0.33 MySQL Community Server - GPL

mysql>

If we run show databases; we should see the following:

1
2
3
4
5
6
7
8
9
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| foobar             |
| information_schema |
| performance_schema |
+--------------------+
3 rows in set (0.03 sec)

If we want to rotate the mysql password for the user, we can update the password_version variable either in our terraform.tfvars or via the cli. Let’s pass the variable in the cli and do a terraform plan to verify the changes:

1
terraform plan -var password_version=1

And due to our value for the random resource keepers parameter being updated, it will trigger the value of our password to be changed, and that will let terraform update our mysql user’s password:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # mysql_user.user_id will be updated in-place
  ~ resource "mysql_user" "user_id" {
        id                 = "ruanb@%"
      ~ plaintext_password = (sensitive value)
        # (5 unchanged attributes hidden)
    }

  # random_password.user_password must be replaced
-/+ resource "random_password" "user_password" {
      ~ bcrypt_hash      = (sensitive value)
      ~ id               = "none" -> (known after apply)
      ~ keepers          = { # forces replacement
          ~ "password_version" = "0" -> "1"
        }
      ~ result           = (sensitive value)
        # (11 unchanged attributes hidden)
    }

Plan: 1 to add, 1 to change, 1 to destroy.

Let’s go ahead by updating our password:

1
terraform apply -var password_version=1 -auto-approve

To validate that the password has changed, we can try to logon to mysql by using the password variable that was created initially:

1
docker exec -it mysql mysql -u ruanb -p$DBPASS

And as you can see authentication failed:

1
2
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'ruanb'@'localhost' (using password: YES)

Set the new password to the variable again:

1
DBPASS=$(terraform output -raw password)

Then try to logon again:

1
docker exec -it mysql mysql -u ruanb -p$DBPASS

And we can see we are logged on again:

1
2
3
4
5
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 8.0.33 MySQL Community Server - GPL

mysql>

Resources

The terraform mysql provider: - https://registry.terraform.io/providers/petoju/mysql/latest/docs

The quick-starts repository: - https://github.com/ruanbekker/quick-starts

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

How to Use the AWS Terraform Provider

In this post we will be using the AWS Terraform provider, from how to install Terraform, create a AWS IAM User, configure the AWS Provider and deploy a EC2 instance using Terraform.

AWS IAM User

In order to authenticate against AWS’s APIs, we need to create a AWS IAM User and create Access Keys for Terraform to use to authenticate.

From https://aws.amazon.com/ logon to your account, then search for IAM:

aws-iam-search-result

Select IAM, then select “Users” on the left hand side and select “Create User”, then provide the username for your AWS IAM User:

aws-iam-user-creation-wizard

Now we need to assign permissions to our new AWS IAM User. For this scenario I will be assigning a IAM Policy directly to the user and I will be selecting the “AdministratorAccess” policy. Keep in mind that this allows admin access to your whole AWS account:

permissions-for-your-aws-iam-user

Once you select the policy, select “Next” and select “Create User”. Once the user has been created, select “Users” on the left hand side, search for your user that we created, in my case “medium-terraform”.

Select the user and click on “Security credentials”. If you scroll down to the “Access keys” section, you will notice we don’t have any access keys for this user:

aws-iam-access-keys

In order to allow Terraform access to our AWS Account, we need to create access keys that Terraform will use, and because we assigned full admin access to the user, Terraform will be able to manage resources in our AWS Account.

Click “Create access key”, then select the “CLI” option and select the confirmation at the bottom:

aws-iam-access-keys-wizard

Select “Next” and then select “Create access key”. I am providing a screenshot of the Access Key and Secret Access Key that has been provided, but by the time this post has been published, the key will be deleted.

retrieve-aws-iam-access-keys

Store your Access Key and Secret Access Key in a secure place and treat this like your passwords. If someone gets access to these keys they can manage your whole AWS Account.

I will be using the AWS CLI to configure my Access Key and Secret Access Key, as I will configure Terraform later to read my Access Keys from the Credential Provider config.

First we need to configure the AWS CLI by passing the profile name, which I have chosen medium for this demonstration:

1
aws --profile medium configure

We will be asked to provide the access key, secret access key, aws region and the default output:

1
2
3
4
AWS Access Key ID [None]: AKIATPRT2G4SGXLAC3HJ
AWS Secret Access Key [None]: KODnR[............]nYTYbd
Default region name [None]: eu-west-1
Default output format [None]: json

To verify if everything works as expected we can use the following command to verify:

1
aws --profile medium sts get-caller-identity

The response should look something similar to the following:

1
2
3
4
5
{
    "UserId": "AIDATPRT2G4SOAO5Y7S5Z",
    "Account": "000000000000",
    "Arn": "arn:aws:iam::000000000000:user/medium-terraform"
}

Terraform

Now that we have our AWS IAM User configured, we can install Terraform, if you don’t have Terraform installed yet, you can follow their Installation Documentation.

Once you have Terraform installed, we can setup our workspace where we will ultimately deploy a EC2 instance, but before we get there we need to create our project directory and change to that directory:

1
2
mkdir ~/terraform-demo
cd ~/terraform-demo

Then we will create 4 files with .tf extensions:

1
2
3
4
touch main.tf
touch outputs.tf
touch providers.tf
touch variables.tf

We will define our Terraform definitions on how we want our desired infrastructure to look like. We will get to the content in the files soon.

I personally love Terraform’s documentation as they are rich in examples and really easy to use.

Head over to the Terraform AWS Provider documentation and you scroll a bit down, you can see the Authentication and Configuration section where they outline the order in how Terraform will look for credentials and we will be making use of the shared credentials file as that is where our access key and secret access key is stored.

If you look at the top right corner of the Terraform AWS Provider documentation, they show you how to use the AWS Provider:

terraform-aws-provider-docs

We can copy that code snippet and paste it into our providers.tf file and configure the aws provider section with the medium profile that we’ve created earlier.

This will tell Terraform where to look for credentials in order to authenticate with AWS.

Open providers.tf with your editor of choice:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "5.8.0"
    }
  }
}

provider "aws" {
  shared_credentials_files = ["~/.aws/credentials"]
  profile                  = "medium"
  region                   = "eu-west-1"
}

Then we can open main.tf and populate the following to define the EC2 instance that we want to provision:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
data "aws_ami" "latest_ubuntu" {
  most_recent = true
  owners = ["099720109477"]

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-*-server-*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }
}

resource "aws_instance" "ec2" {
  ami           = data.aws_ami.latest_ubuntu.id
  instance_type = var.instance_type
  tags = {
    Name = "${var.instance_name}-ec2-instance"
  }
}

In the above example we are filtering for the latest Ubuntu 22.04 64bit AMI then we are defining a EC2 instance and specifying the AMI ID that we filtered from our data source.

Note that we haven’t specified a SSH Keypair, as we are just focusing on how to provision a EC2 instance.

As you can see we are also referencing variables, which we need to define in variables.tf :

1
2
3
4
5
6
7
8
9
10
11
variable "instance_name" {
  description = "Instance Name for EC2."
  type        = string
  default     = "test"
}

variable "instance_type" {
  description = "Instance Type for EC2."
  type        = string
  default     = "t2.micro"
}

And then lastly we need to define our outputs.tf which will be used to output the instance id and ip address:

1
2
3
4
5
6
7
output "instance_id" {
  value = aws_instance.ec2.id
}

output "ip" {
  value = aws_instance.ec2.public_ip
}

Deploy our EC2 with Terraform

Now that our infrastructure has been defined as code, we can first initialise terraform which will initialise the backend and download all the providers that has been defined:

1
terraform init

Once that has done we can run a “plan” which will show us what Terraform will deploy:

1
terraform plan

Now terraform will show us the difference in what we have defined, and what is actually in AWS, as we know its a new account with zero infrastructure, the diff should show us that it needs to create a EC2 instance.

The response from the terraform plan shows us the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_instance.ec2 will be created
  + resource "aws_instance" "ec2" {
      + ami                                  = "ami-0f56955469757e5aa"
      + arn                                  = (known after apply)
      + id                                   = (known after apply)
      + instance_type                        = "t2.micro"
      + key_name                             = (known after apply)
      + private_ip                           = (known after apply)
      + public_ip                            = (known after apply)
      + security_groups                      = (known after apply)
      + subnet_id                            = (known after apply)
      + tags                                 = {
          + "Name" = "test-ec2-instance"
        }
      + tags_all                             = {
          + "Name" = "test-ec2-instance"
        }
      + vpc_security_group_ids               = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + instance_id = (known after apply)
  + ip          = (known after apply)

As you can see terraform has looked up the AMI ID using the data source, and we can see that terraform will provision 1 resource which is a EC2 instance. Once we hare happy with the plan, we can run a apply which will show us the same but this time prompt us if we want to proceed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_instance.ec2: Creating...
aws_instance.ec2: Still creating... [10s elapsed]
aws_instance.ec2: Still creating... [20s elapsed]
aws_instance.ec2: Still creating... [30s elapsed]
aws_instance.ec2: Creation complete after 35s [id=i-005c08b899229fff0]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

instance_id = "i-005c08b899229fff0"
ip = "34.253.196.167"

And now we can see our EC2 instance was provisioned and our outputs returned the instance id as well as the public ip address.

We can also confirm this by looking at the AWS EC2 Console:

aws-ec2-instances-in-console

Note that Terraform Configuration is idempotent, so when we run a terraform apply again, terraform will check what we have defined as what we want our desired infrastructure to be like, and what we actually have in our AWS Account, and since we haven’t made any changes there should be no changes.

We can run a terraform apply to validate that:

1
terraform apply

And we can see the response shows:

1
2
3
4
5
6
7
8
9
data.aws_vpc.selected: Reading...
data.aws_ami.latest_ubuntu: Reading...
data.aws_ami.latest_ubuntu: Read complete after 1s [id=ami-0f56955469757e5aa]
data.aws_vpc.selected: Read complete after 1s [id=vpc-063d7ac3124053dfa]
data.aws_subnet.selected: Reading...
data.aws_subnet.selected: Read complete after 1s [id=subnet-0b7acd7593611c1bb]
aws_instance.ec2: Refreshing state... [id=i-005c08b899229fff0]

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Cleanup

Destroy the infrastructure that we provisioned:

1
terraform destroy

It will show us what terraform will destroy, then upon confirming we should see the following output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Plan: 0 to add, 0 to change, 1 to destroy.

Changes to Outputs:
  - instance_id = "i-005c08b899229fff0" -> null
  - ip          = "34.253.196.167" -> null

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

aws_instance.ec2: Destroying... [id=i-005c08b899229fff0]
aws_instance.ec2: Still destroying... [id=i-005c08b899229fff0, 10s elapsed]
aws_instance.ec2: Still destroying... [id=i-005c08b899229fff0, 20s elapsed]
aws_instance.ec2: Still destroying... [id=i-005c08b899229fff0, 30s elapsed]
aws_instance.ec2: Destruction complete after 31s

Destroy complete! Resources: 1 destroyed.

If you followed along and you also want to clean up the AWS IAM user, head over to the AWS IAM Console and delete the “medium-terraform” IAM User.

Thank You

I hope you enjoyed this post, I will be posting more terraform related content.

Should you want to reach out to me, you can follow me on Twitter at @ruanbekker or check out my website at https://ruan.dev

Getting Started With FerretDB on Docker

how-to-run-ferretdb-on-docker

In this post we will have a look at FerretDB which is a opensource proxy that translates MongoDB queries to SQL, where PostgreSQL being the database engine.

More about FerretDB

From FerretDB website, they describe FerretDB as:

Initially built as open-source software, MongoDB was a game-changer for many developers, enabling them to build fast and robust applications. Its ease of use and extensive documentation made it a top choice for many developers looking for an open-source database. However, all this changed when they switched to an SSPL license, moving away from their open-source roots.

In light of this, FerretDB was founded to become the true open-source alternative to MongoDB, making it the go-to choice for most MongoDB users looking for an open-source alternative to MongoDB. With FerretDB, users can run the same MongoDB protocol queries without needing to learn a new language or command.

What can you expect from this tutorial

We will be doing the following:

  • deploying ferretdb and postgres on docker containers using docker compose
  • then use mongosh as a client to logon to ferretdb using the ferretdb endpoint
  • explore some example queries to insert and read data from ferretdb
  • use scripting to generate data into ferretedb
  • explore the embedded prometheus endpoint for metrics

Deploy FerretDB

The following docker-compose.yaml defines a postgres container which will be used as the database engine for ferretdb, and then we define the ferretdb container, which connects to postgres via the environment variable FERRETDB_POSTGRESQL_URL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
version: "3.9"

services:
  postgres:
    image: postgres:14.8-bullseye
    container_name: postgres
    restart: unless-stopped
    environment:
      - POSTGRES_USER=ferret
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=ferretdb
    volumes:
      - pgvol:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready", "-d", "db_prod"]
      interval: 30s
      timeout: 15s
      retries: 5
      start_period: 60s
    networks:
      - ferretdb
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

  ferretdb:
    image: ghcr.io/ferretdb/ferretdb:1.1.0
    container_name: ferretdb
    restart: unless-stopped
    ports:
      - 27017:27017
      - 8080:8080
    environment:
      - FERRETDB_POSTGRESQL_URL=postgres://postgres:5432/ferretdb
    depends_on:
      postgres:
        condition: service_healthy
    networks:
      - ferretdb
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

networks:
  ferretdb:
    name: ferretdb

volumes:
  pgvol:

Once you have the content above saved in docker-compose.yaml you can run the following to run the containers in a detached mode:

1
docker-compose up -d

Connect to FerretDB

Once the containers started, we can connect to our ferretdb server using mongosh, which is a shell utility to connect to the database). I will make use of a container to do this, where I will reference the network which we defined in our docker compose file, and set the endpoint that mongosh need to connect to:

1
docker run --rm -it --network=ferretdb --entrypoint=mongosh mongo:6.0 "mongodb://ferret:password@ferretdb/ferretdb?authMechanism=PLAIN"

Once it successfully connects to ferretdb, we should see the following prompt:

1
2
3
4
5
6
Current Mongosh Log ID:  64626c5c259916d1a68b7dad
Connecting to:        mongodb://<credentials>@ferretdb/ferretdb?authMechanism=PLAIN&directConnection=true&appName=mongosh+1.8.2
Using MongoDB:        6.0.42
Using Mongosh:        1.8.2

ferretdb>

Run example queries on FerretDB

If you are familiar with MongoDB, you will find the following identical to MongoDB.

First we show the current databases:

1
2
ferretdb> show dbs;
public  0 B

The we create and use the database named mydb:

1
2
ferretdb> use mydb
switched to db mydb

To see which database are we currently connected to:

1
2
mydb> db
mydb

Now we can create a collection named mycol1 and mycol2:

1
2
3
4
mydb> db.createCollection("mycol1")
{ ok: 1 }
mydb> db.createCollection("mycol2")
{ ok: 1 }

We can view our collections by running the following:

1
2
3
mydb> show collections
mycol1
mycol2

To write one document into our collection named col1 with the following data:

1
2
3
4
5
6
7
8
9
{
  "name": "ruan",
  "age": 32,
  "hobbies": [
    "golf",
    "programming",
    "music"
  ]
}

We can execute:

1
2
3
4
5
mydb> db.mycol1.insertOne({"name": "ruan", "age": 32, "hobbies": ["golf", "programming", "music"]})
{
  acknowledged: true,
  insertedIds: { '0': ObjectId("64626cea259916d1a68b7dae") }
}

And we can insert another document:

1
2
3
4
5
mydb> db.mycol1.insertOne({"name": "michelle", "age": 28, "hobbies": ["art", "music", "reading"]})
{
  acknowledged: true,
  insertedIds: { '0': ObjectId("64626cf1259916d1a68b7daf") }
}

We can then use countDocuments() to view the number of documents in our collection named mycol1:

1
2
ferretdb> db.mycol1.countDocuments()
2

If we want to find all our documents in our mycol1 collection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mydb> db.mycol1.find()
[
  {
    _id: ObjectId("64626cea259916d1a68b7dae"),
    name: 'ruan',
    age: 32,
    hobbies: [ 'golf', 'programming', 'music' ]
  },
  {
    _id: ObjectId("64626cf1259916d1a68b7daf"),
    name: 'michelle',
    age: 28,
    hobbies: [ 'art', 'music', 'reading' ]
  }
]

If we want to only display specific fields in our response, such as name and age, we can project fields to return from our query:

1
2
3
4
5
6
7
8
9
mydb> db.mycol1.find({}, {"name": 1, "age": 1})
[
  { _id: ObjectId("64626cea259916d1a68b7dae"), name: 'ruan', age: 32 },
  {
    _id: ObjectId("64626cf1259916d1a68b7daf"),
    name: 'michelle',
    age: 28
  }
]

We can also suppress the _id field by setting the value to 0:

1
2
3
4
5
mydb> db.mycol1.find({}, {"_id": 0, "name": 1, "age": 1})
[
  { name: 'ruan', age: 32 },
  { name: 'michelle', age: 28 }
]

Next we can return all the fields name and age from our collection where the age field is equals to 32:

1
2
mydb> db.mycol1.find({"age": 32}, {"_id": 0, "name": 1, "age": 1})
[ { name: 'ruan', age: 32 } ]

We can also find a specific document by its id as example, and return only the field value, like name:

1
2
mydb> db.mycol1.findOne({_id: ObjectId("64626cea259916d1a68b7dae")}).name
ruan

Next we will find all documents where the age is greater than 30:

1
2
3
4
5
6
7
8
9
mydb> db.mycol1.find({"age": {"$gt": 30}})
[
  {
    _id: ObjectId("64626cea259916d1a68b7dae"),
    name: 'ruan',
    age: 32,
    hobbies: [ 'golf', 'programming', 'music' ]
  }
]

Let’s explore how to insert many documents at once using insertMany(), first create a new collection:

1
2
ferretdb> db.createCollection("mycol2")
{ ok: 1 }

We can then define the docs variable, and assign a array with 2 json documents:

1
ferretdb> var docs = [{name: "peter", age: 34, hobbies: ["ski", "programming", "music"]}, {name: "sam", age: 39, hobbies: ["running", "camping", "music"]}]

Now we can insert our documents to ferretdb using insertMany():

1
2
3
4
5
6
7
8
ferretdb> db.mycol2.insertMany(docs)
{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId("6464ceb1413cee26e9bf709f"),
    '1': ObjectId("6464ceb1413cee26e9bf70a0")
  }
}

We can count the documents inside our collection using:

1
2
ferretdb> db.mycol2.countDocuments()
2

And we can search for all the documents inside the collection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
ferretdb> db.mycol2.find()
[
  {
    _id: ObjectId("6464ceb1413cee26e9bf709f"),
    name: 'peter',
    age: 34,
    hobbies: [ 'ski', 'programming', 'music' ]
  },
  {
    _id: ObjectId("6464ceb1413cee26e9bf70a0"),
    name: 'sam',
    age: 39,
    hobbies: [ 'running', 'camping', 'music' ]
  }
]

And searching for any data using the name peter:

1
2
3
4
5
6
7
8
9
ferretdb> db.mycol2.find({name: "peter"})
[
  {
    _id: ObjectId("6464ceb1413cee26e9bf709f"),
    name: 'peter',
    age: 34,
    hobbies: [ 'ski', 'programming', 'music' ]
  }
]

Scripting

We will create a script so that we can generate data that we want to write into FerretDB.

Create the following script, write.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
var txs = []
for (var x = 0; x < 1000 ; x++) {
 var transaction_types = ["credit card", "cash", "account"];
 var store_names = ["edgards", "cna", "makro", "picknpay", "checkers"];
 var random_transaction_type = Math.floor(Math.random() * (2 - 0 + 1)) + 0;
 var random_store_name = Math.floor(Math.random() * (4 - 0 + 1)) + 0;
 var random_age = Math.floor(Math.random() * (80 - 18) + 18)
 txs.push({
   transaction: 'tx_' + x,
   transaction_price: Math.round(Math.random()*1000),
   transaction_type: transaction_types[random_transaction_type],
   store_name: store_names[random_store_name],
   age: random_age
   });
}
console.log("drop and recreate the collection")
db.mycollection1.drop()
db.createCollection("mycollection1")
console.log("insert documents into collection")
db.mycollection1.insertMany(txs)

The script will loop a 1000 times and create documents that will include fields of transaction_types, store_names, random_transaction_type, random_store_name and random_age.

Use docker, mount the file inside the container, point the database endpoint to ferretdb and load the file that we want to execute:

1
docker run --rm -it --network=ferretdb -v $PWD/write.js:/src/write.js --entrypoint=mongosh mongo:6.0 "mongodb://ferret:password@ferretdb/ferretdb?authMechanism=PLAIN" --eval 'load("/src/write.js")'

Now when we run a mongosh client:

1
docker run --rm -it --network=ferretdb -v $PWD/write.js:/src/write.js --entrypoint=mongosh mongo:6.0 "mongodb://ferret:password@ferretdb/ferretdb?authMechanism=PLAIN"

And we query for the store_name: picknpay and only show the transaction_type and transaction fields:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ferretdb> db.mycollection1.find({"store_name": "picknpay"}, {_id: 0, transaction_type: 1, transaction: 1})
[
  { transaction_type: 'credit card', transaction: 'tx_3' },
  { transaction_type: 'cash', transaction: 'tx_9' },
  { transaction_type: 'account', transaction: 'tx_10' },
  { transaction_type: 'credit card', transaction: 'tx_15' },
  { transaction_type: 'credit card', transaction: 'tx_19' },
  { transaction_type: 'cash', transaction: 'tx_21' },
  { transaction_type: 'cash', transaction: 'tx_28' },
  { transaction_type: 'account', transaction: 'tx_31' },
  { transaction_type: 'cash', transaction: 'tx_37' },
  { transaction_type: 'cash', transaction: 'tx_39' },
  { transaction_type: 'account', transaction: 'tx_40' },
  { transaction_type: 'cash', transaction: 'tx_51' },
  { transaction_type: 'account', transaction: 'tx_52' },
  { transaction_type: 'cash', transaction: 'tx_58' },
  { transaction_type: 'credit card', transaction: 'tx_62' },
  { transaction_type: 'credit card', transaction: 'tx_65' },
  { transaction_type: 'account', transaction: 'tx_69' },
  { transaction_type: 'account', transaction: 'tx_71' },
  { transaction_type: 'cash', transaction: 'tx_72' },
  { transaction_type: 'account', transaction: 'tx_74' }
]

We can also use the --eval flag with the mongosh container to run ad-hoc queries such as counting documents for a collection:

1
2
3
4
docker run --rm -it --network=ferretdb \
  -v $PWD/write.js:/src/write.js:ro \
  --entrypoint=mongosh mongo:6.0 \
  "mongodb://ferret:password@ferretdb/ferretdb?authMechanism=PLAIN" --eval 'db.mycollection1.countDocuments()'

Prometheus Metrics

FerretDB provides prometheus metrics out of the box, and outputs prometheus metrics on the :8080/debug/metrics endpoint:

1
curl http://localhost:8080/debug/metrics

Which will output metrics more or less like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
ferretdb_client_accepts_total{error="0"} 98
ferretdb_client_connected 0
ferretdb_client_requests_total{command="aggregate",opcode="OP_MSG"} 5
ferretdb_client_requests_total{command="atlasVersion",opcode="OP_MSG"} 27
ferretdb_client_requests_total{command="buildInfo",opcode="OP_MSG"} 27
ferretdb_client_requests_total{command="buildinfo",opcode="OP_MSG"} 2
ferretdb_client_requests_total{command="count",opcode="OP_MSG"} 5
ferretdb_client_requests_total{command="create",opcode="OP_MSG"} 7
ferretdb_client_requests_total{command="drop",opcode="OP_MSG"} 3
ferretdb_client_requests_total{command="dropDatabase",opcode="OP_MSG"} 4
ferretdb_client_requests_total{command="find",opcode="OP_MSG"} 27
ferretdb_client_requests_total{command="getCmdLineOpts",opcode="OP_MSG"} 27
ferretdb_client_requests_total{command="getFreeMonitoringStatus",opcode="OP_MSG"} 20
ferretdb_client_requests_total{command="getLog",opcode="OP_MSG"} 20
ferretdb_client_requests_total{command="getParameter",opcode="OP_MSG"} 27
ferretdb_client_requests_total{command="hello",opcode="OP_MSG"} 20
ferretdb_client_requests_total{command="insert",opcode="OP_MSG"} 15
ferretdb_client_requests_total{command="ismaster",opcode="OP_MSG"} 238
ferretdb_client_requests_total{command="listCollections",opcode="OP_MSG"} 49
ferretdb_client_requests_total{command="listDatabases",opcode="OP_MSG"} 12
ferretdb_client_requests_total{command="ping",opcode="OP_MSG"} 40
ferretdb_client_requests_total{command="saslStart",opcode="OP_MSG"} 70
ferretdb_client_requests_total{command="setFreeMonitoring",opcode="OP_MSG"} 1
ferretdb_client_requests_total{command="unknown",opcode="OP_QUERY"} 96
ferretdb_client_responses_total{argument="unknown",command="aggregate",opcode="OP_MSG",result="ok"} 5
ferretdb_client_responses_total{argument="unknown",command="atlasVersion",opcode="OP_MSG",result="CommandNotFound"} 27
ferretdb_client_responses_total{argument="unknown",command="buildInfo",opcode="OP_MSG",result="ok"} 27
ferretdb_client_responses_total{argument="unknown",command="buildinfo",opcode="OP_MSG",result="ok"} 2
ferretdb_client_responses_total{argument="unknown",command="count",opcode="OP_MSG",result="ok"} 5
ferretdb_client_responses_total{argument="unknown",command="create",opcode="OP_MSG",result="ok"} 7
ferretdb_client_responses_total{argument="unknown",command="drop",opcode="OP_MSG",result="NamespaceNotFound"} 2
ferretdb_client_responses_total{argument="unknown",command="drop",opcode="OP_MSG",result="ok"} 1
ferretdb_client_responses_total{argument="unknown",command="dropDatabase",opcode="OP_MSG",result="ok"} 4
ferretdb_client_responses_total{argument="unknown",command="find",opcode="OP_MSG",result="ok"} 27
ferretdb_client_responses_total{argument="unknown",command="getCmdLineOpts",opcode="OP_MSG",result="ok"} 27
ferretdb_client_responses_total{argument="unknown",command="getFreeMonitoringStatus",opcode="OP_MSG",result="ok"} 20
ferretdb_client_responses_total{argument="unknown",command="getLog",opcode="OP_MSG",result="ok"} 20
ferretdb_client_responses_total{argument="unknown",command="getParameter",opcode="OP_MSG",result="Unset"} 27
ferretdb_client_responses_total{argument="unknown",command="hello",opcode="OP_MSG",result="ok"} 20
ferretdb_client_responses_total{argument="unknown",command="insert",opcode="OP_MSG",result="ok"} 15
ferretdb_client_responses_total{argument="unknown",command="ismaster",opcode="OP_MSG",result="ok"} 238
ferretdb_client_responses_total{argument="unknown",command="listCollections",opcode="OP_MSG",result="ok"} 49
ferretdb_client_responses_total{argument="unknown",command="listDatabases",opcode="OP_MSG",result="ok"} 12
ferretdb_client_responses_total{argument="unknown",command="ping",opcode="OP_MSG",result="ok"} 40
ferretdb_client_responses_total{argument="unknown",command="saslStart",opcode="OP_MSG",result="ok"} 70
ferretdb_client_responses_total{argument="unknown",command="setFreeMonitoring",opcode="OP_MSG",result="ok"} 1
ferretdb_client_responses_total{argument="unknown",command="unknown",opcode="OP_REPLY",result="ok"} 93
ferretdb_client_responses_total{argument="unknown",command="unknown",opcode="OP_REPLY",result="unhandled"} 3
ferretdb_up{branch="unknown",commit="3344cbb98bb744dd044bcf2d51fe9ab65db22f0b",debug="false",dirty="true",package="docker",telemetry="disabled",update_available="false",uuid="08174d33-05fd-45ed-adb9-d2e343e0af83",version="v1.1.0"} 1
process_cpu_seconds_total 16.98
process_max_fds 1.048576e+06
process_open_fds 13
process_resident_memory_bytes 2.5714688e+07
process_start_time_seconds 1.68425346762e+09
process_virtual_memory_bytes 7.52529408e+08
process_virtual_memory_max_bytes 1.8446744073709552e+19
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
promhttp_metric_handler_requests_in_flight 1
promhttp_metric_handler_requests_total{code="200"} 2
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

Resources

Please see the follwoing resources for FerretDB:

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

How to Run a AMD64 Bit Linux VM on a Mac M1

This tutorial will show you how you can run 64bit Ubuntu Linux Virtual Machines on a Apple Mac M1 arm64 architecture macbook using UTM.

Installation

Head over to their documentation and download the UTM.dmg file and install it, once it is installed and you have opened UTM, you should see this screen:

image

Creating a Virtual Machine

In my case I would like to run a Ubuntu VM, so head over to the Ubuntu Server Download page and download the version of choice, I will be downloading Ubuntu Server 22.04, once you have your ISO image downloaded, you can head over to the next step which is to “Create a New Virtual Machine”:

image

I will select “Emulate” as I want to run a amd64 bit architecture, then select “Linux”:

image

In the next step we want to select the Ubuntu ISO image that we downloaded, which we want to use to boot our VM from:

image

Browse and select the image that you downloaded, once you selected it, it should show something like this:

image

Select continue, then select the architecture to x86_64, the system I kept on defaults and the memory I have set to 2048MB and cores to 2 but that is just my preference:

image

The next screen is to configure storage, as this is for testing I am setting mine to 8GB:

image

The next screen is shared directories, this is purely optional, I have created a directory for this:

1
mkdir ~/utm

Which I’ve then defined for a shared directory, but this depends if you need to have shared directories from your local workstation.

The next screen is a summary of your choices and you can name your vm here:

image

Once you are happy select save, and you should see something like this:

image

You can then select the play button to start your VM.

The console should appear and you can select install or try this vm:

image

This will start the installation process of a Linux Server:

image

Here you can select the options that you would like, I would just recommend to ensure that you select Install OpenSSH Server so that you can connect to your VM via SSH.

Once you get to this screen:

image

The installation process is busy and you will have to wait a couple of minutes for it to complete. Once you see the following screen the installation is complete:

image

On the right hand side select the circle, then select CD/DVD and select the ubuntu iso and select eject:

image

Starting your VM

Then power off the guest and power on again, then you should get a console login, then you can proceed to login, and view the ip address:

SSH to your VM

Now from your terminal you should be able to ssh to the VM:

We can also verify that we are running a 64bit vm, by running uname --processor:

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Running a Multi-Broker Kafka Cluster on Docker

In this post we will run a Kakfa cluster with 3 kafka brokers on docker compose and using a producer to send messages to our topics and a consumer that will receive the messages from the topics, which we will develop in python and explore the kafka-ui.

What is Kafka?

Kafka is a distributed event store and stream processing platform. Kafka is used to build real-time streaming data pipelines and real-time streaming applications.

This is a fantastic resource if you want to understand the components better in detail: - apache-kafka-architecture-what-you-need-to-know

But on a high level, the components of a typical Kafka setup:

  1. Zookeeper: Kafka relies on Zookeeper to do leadership election of Kafka Brokers and Topic Partitions.
  2. Broker: Kafka server that receives messages from producers, assigns them to offsets and commit the messages to disk storage. A offset is used for data consistency in a event of failure, so that consumers know from where to consume from their last message.
  3. Topic: A topic can be thought of categories to organize messages. Producers writes messages to topics, consumers reads from those topics.
  4. Partitions: A topic is split into multiple partitions. This improves scalability through parallelism (not just one broker). Kafka also does replication

For great in detail information about kafka and its components, I encourage you to visit the mentioned post from above.

Launch Kafka

This is the docker-compose.yaml that we will be using to run a kafka cluster with 3 broker containers, 1 zookeeper container, 1 producer, 1 consumer and a kafka-ui.

All the source code is available on my quick-starts github repository .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
version: "3.9"

services:
  zookeeper:
    platform: linux/amd64
    image: confluentinc/cp-zookeeper:${CONFLUENT_PLATFORM_VERSION:-7.4.0}
    container_name: zookeeper
    restart: unless-stopped
    ports:
      - '32181:32181'
      - '2888:2888'
      - '3888:3888'
    environment:
      ZOOKEEPER_SERVER_ID: 1
      ZOOKEEPER_CLIENT_PORT: 32181
      ZOOKEEPER_TICK_TIME: 2000
      ZOOKEEPER_INIT_LIMIT: 5
      ZOOKEEPER_SYNC_LIMIT: 2
      ZOOKEEPER_SERVERS: zookeeper:2888:3888
    healthcheck:
      test: echo stat | nc localhost 32181
      interval: 10s
      timeout: 10s
      retries: 3
    networks:
      - kafka
    logging:
      driver: "json-file"
      options:
        max-size: "1m"

  kafka-ui:
    container_name: kafka-ui
    image: provectuslabs/kafka-ui:latest
    ports:
      - 8080:8080
    depends_on:
      - broker-1
      - broker-2
      - broker-3
    environment:
      KAFKA_CLUSTERS_0_NAME: broker-1
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: broker-1:29091
      KAFKA_CLUSTERS_0_METRICS_PORT: 19101
      KAFKA_CLUSTERS_1_NAME: broker-2
      KAFKA_CLUSTERS_1_BOOTSTRAPSERVERS: broker-2:29092
      KAFKA_CLUSTERS_1_METRICS_PORT: 19102
      KAFKA_CLUSTERS_2_NAME: broker-3
      KAFKA_CLUSTERS_2_BOOTSTRAPSERVERS: broker-3:29093
      KAFKA_CLUSTERS_2_METRICS_PORT: 19103
      DYNAMIC_CONFIG_ENABLED: 'true'
    networks:
      - kafka
    logging:
      driver: "json-file"
      options:
        max-size: "1m"

  broker-1:
    platform: linux/amd64
    image: confluentinc/cp-kafka:${CONFLUENT_PLATFORM_VERSION:-7.4.0}
    container_name: broker-1
    restart: unless-stopped
    ports:
      - '9091:9091'
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:32181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_ADVERTISED_LISTENERS: INTERNAL://broker-1:29091,EXTERNAL://localhost:9091
      KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_NUM_PARTITIONS: 3
      KAFKA_JMX_PORT: 19101
      KAFKA_JMX_HOSTNAME: localhost
    healthcheck:
      test: nc -vz localhost 9091
      interval: 10s
      timeout: 10s
      retries: 3
    networks:
      - kafka
    logging:
      driver: "json-file"
      options:
        max-size: "1m"

  broker-2:
    platform: linux/amd64
    image: confluentinc/cp-kafka:${CONFLUENT_PLATFORM_VERSION:-7.4.0}
    container_name: broker-2
    restart: unless-stopped
    ports:
      - '9092:9092'
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 2
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:32181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_ADVERTISED_LISTENERS: INTERNAL://broker-2:29092,EXTERNAL://localhost:9092
      KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_NUM_PARTITIONS: 3
      KAFKA_JMX_PORT: 19102
      KAFKA_JMX_HOSTNAME: localhost
    healthcheck:
      test: nc -vz localhost 9092
      interval: 10s
      timeout: 10s
      retries: 3
    networks:
      - kafka
    logging:
      driver: "json-file"
      options:
        max-size: "1m"

  broker-3:
    platform: linux/amd64
    image: confluentinc/cp-kafka:${CONFLUENT_PLATFORM_VERSION:-7.4.0}
    container_name: broker-3
    restart: unless-stopped
    ports:
      - '9093:9093'
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 3
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:32181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_ADVERTISED_LISTENERS: INTERNAL://broker-3:29093,EXTERNAL://localhost:9093
      KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_NUM_PARTITIONS: 3
      KAFKA_JMX_PORT: 19103
      KAFKA_JMX_HOSTNAME: localhost
    healthcheck:
      test: nc -vz localhost 9093
      interval: 10s
      timeout: 10s
      retries: 3
    networks:
      - kafka
    logging:
      driver: "json-file"
      options:
        max-size: "1m"

  producer:
    platform: linux/amd64
    container_name: producer
    image: ruanbekker/kafka-producer-consumer:2023-05-17
    # source: https://github.com/ruanbekker/quick-starts/tree/main/docker/kafka/python-client
    restart: always
    environment:
      - ACTION=producer
      - BOOTSTRAP_SERVERS=broker-1:29091,broker-2:29092,broker-3:29093
      - TOPIC=my-topic
      - PYTHONUNBUFFERED=1 # https://github.com/docker/compose/issues/4837#issuecomment-302765592
    networks:
      - kafka
    depends_on:
      - zookeeper
      - broker-1
      - broker-2
      - broker-3
    logging:
      driver: "json-file"
      options:
        max-size: "1m"

  consumer:
    platform: linux/amd64
    container_name: consumer
    image: ruanbekker/kafka-producer-consumer:2023-05-17
    # source: https://github.com/ruanbekker/quick-starts/tree/main/docker/kafka/python-client
    restart: always
    environment:
      - ACTION=consumer
      - BOOTSTRAP_SERVERS=broker-1:29091,broker-2:29092,broker-3:29093
      - TOPIC=my-topic
      - CONSUMER_GROUP=cg-group-id
      - PYTHONUNBUFFERED=1 # https://github.com/docker/compose/issues/4837#issuecomment-302765592
    networks:
      - kafka
    depends_on:
      - zookeeper
      - broker-1
      - broker-2
      - broker-3
      - producer
    logging:
      driver: "json-file"
      options:
        max-size: "1m"

networks:
  kafka:
    name: kafka

Note: This docker-compose yaml can be found in my kafka quick-starts repository.

In our compose file we defined our core stack:

  • 1 Zookeeper Container
  • 3 Kafka Broker Containers
  • 1 Kafka UI

Then we have our clients:

We can boot the stack with:

1
docker-compose up -d

You can verify that the brokers are passing their health checks with:

1
2
3
4
5
6
7
8
9
10
docker-compose ps

NAME                IMAGE                                           COMMAND                  SERVICE             CREATED             STATUS                   PORTS
broker-1            confluentinc/cp-kafka:7.4.0                     "/etc/confluent/dock…"   broker-1            5 minutes ago       Up 4 minutes (healthy)   0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 9092/tcp
broker-2            confluentinc/cp-kafka:7.4.0                     "/etc/confluent/dock…"   broker-2            5 minutes ago       Up 4 minutes (healthy)   0.0.0.0:9092->9092/tcp, :::9092->9092/tcp
broker-3            confluentinc/cp-kafka:7.4.0                     "/etc/confluent/dock…"   broker-3            5 minutes ago       Up 4 minutes (healthy)   9092/tcp, 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp
consumer            ruanbekker/kafka-producer-consumer:2023-05-17   "sh /src/run.sh $ACT…"   consumer            5 minutes ago       Up 4 minutes
kafka-ui            provectuslabs/kafka-ui:latest                   "/bin/sh -c 'java --…"   kafka-ui            5 minutes ago       Up 4 minutes             0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
producer            ruanbekker/kafka-producer-consumer:2023-05-17   "sh /src/run.sh $ACT…"   producer            5 minutes ago       Up 4 minutes
zookeeper           confluentinc/cp-zookeeper:7.4.0                 "/etc/confluent/dock…"   zookeeper           5 minutes ago       Up 5 minutes (healthy)   0.0.0.0:2888->2888/tcp, :::2888->2888/tcp, 0.0.0.0:3888->3888/tcp, :::3888->3888/tcp, 2181/tcp, 0.0.0.0:32181->32181/tcp, :::32181->32181/tcp

Producers and Consumers

The producer generates random data and sends it to a topic, where the consumer will listen on the same topic and read messages from that topic.

To view the output of what the producer is doing, you can tail the logs:

1
2
3
4
5
6
7
8
docker logs -f producer

setting up producer, checking if brokers are available
brokers not available yet
brokers are available and ready to produce messages
message sent to kafka with squence id of 1
message sent to kafka with squence id of 2
message sent to kafka with squence id of 3

And to view the output of what the consumer is doing, you can tail the logs:

1
2
3
4
5
6
7
docker logs -f consumer

starting consumer, checks if brokers are availabe
brokers not availbe yet
brokers are available and ready to consume messages
{'sequence_id': 10, 'user_id': '20520', 'transaction_id': '4026fd10-2aca-4d2e-8bd2-8ef0201af2dd', 'product_id': '17974', 'address': '71741 Lopez Throughway | South John | BT', 'signup_at': '2023-05-11 06:54:52', 'platform_id': 'Tablet', 'message': 'transaction made by userid 119740995334901'}
{'sequence_id': 11, 'user_id': '78172', 'transaction_id': '4089cee1-0a58-4d9b-9489-97b6bc4b768f', 'product_id': '21477', 'address': '735 Jasmine Village Apt. 009 | South Deniseland | BN', 'signup_at': '2023-05-17 09:54:10', 'platform_id': 'Tablet', 'message': 'transaction made by userid 159204336307945'}

Kafka UI

The Kafka UI will be available on http://localhost:8080

Where we can view lots of information, but in the below screenshot we can see our topics:

image

And when we look at the my-topic, we can see a overview dashboard of our topic information:

image

We can also look at the messages in our topic, and also search for messages:

image

And we can also look at the current consumers:

image

Resources

My Quick-Starts Github Repository:

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Manage Helm Releases With Terraform

helm-releases-with-terraform

In this post we will use terraform to deploy a helm release to kubernetes.

Kubernetes

For this demonstration I will be using kind to deploy a local Kubernetes cluster to the operating system that I am running this on, which will be Ubuntu Linux. For a more in-depth tutorial on Kind, you can see my post on Kind for Local Kubernetes Clusters.

Installing the Pre-Requirements

We will be installing terraform, docker, kind and kubectl on Linux.

Install terraform:

1
2
3
4
wget https://releases.hashicorp.com/terraform/1.3.0/terraform_1.3.0_linux_amd64.zip
unzip terraform_1.3.0_linux_amd64.zip
rm terraform_1.3.0_linux_amd64.zip
mv terraform /usr/bin/terraform

Verify that terraform has been installed:

1
terraform -version

Which in my case returns:

1
2
Terraform v1.3.0
on linux_amd64

Install Docker on Linux (be careful to curl pipe bash - trust the scripts that you are running):

1
curl https://get.docker.com | bash

Then running docker ps should return:

1
CONTAINER ID   IMAGE        COMMAND         CREATED          STATUS          PORTS       NAMES

Install kind on Linux:

1
2
3
4
apt update
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.17.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

Then to verify that kind was installed with kind --version should return:

1
kind version 0.17.0

Create a kubernetes cluster using kind:

1
kind create cluster --name rbkr --image kindest/node:v1.24.0

Now install kubectl:

1
2
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

Then to verify that kubectl was installed:

1
kubectl version --client

Which in my case returns:

1
2
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7

Now we can test if kubectl can communicate with the kubernetes api server:

1
kubectl get nodes

In my case it returns:

1
2
NAME                 STATUS   ROLES           AGE     VERSION
rbkr-control-plane   Ready    control-plane   6m20s   v1.24.0

Terraform

Now that our pre-requirements are sorted we can configure terraform to communicate with kubernetes. For that to happen, we need to consult the terraform kubernetes provider’s documentation.

As per their documentation they provide us with this snippet:

1
2
3
4
5
6
7
8
9
10
11
12
terraform {
  required_providers {
    kubernetes = {
      source = "hashicorp/kubernetes"
      version = "2.18.0"
    }
  }
}

provider "kubernetes" {
  # Configuration options
}

And from their main page, it gives us a couple of options to configure the provider and the easiest is probably to read the ~/.kube/config configuration file.

But in cases where you have multiple configurations in your kube config file, this might not be ideal, and I like to be precise, so I will extract the client certificate, client key and cluster ca certificate and endpoint from our ~/.kube/config file.

If we run cat ~/.kube/config we will see something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRU......FURS0tLS0tCg==
    server: https://127.0.0.1:40305
  name: kind-rbkr
contexts:
- context:
    cluster: kind-rbkr
    user: kind-rbkr
  name: kind-rbkr
current-context: kind-rbkr
kind: Config
preferences: {}
users:
- name: kind-rbkr
  user:
    client-certificate-data: LS0tLS1CRX......FURS0tLS0tCg==
    client-key-data: LS0tLS1CRUejhKWUk2N2.....S0tCg==

First we will create a directory for our certificates:

1
mkdir ~/certs

I have truncated my kube config for readability, but for our first file certs/client-cert.pem we will copy the value of client-certificate-data:, which will look something like this:

1
2
cat certs/client-cert.pem
LS0tLS1CRX......FURS0tLS0tCg==

Then we will copy the contents of client-key-data: into certs/client-key.pem and then lastly the content of certificate-authority-data: into certs/cluster-ca-cert.pem.

So then we should have the following files inside our certs/ directory:

1
2
3
4
5
6
7
tree certs/
certs/
├── client-cert.pem
├── client-key.pem
└── cluster-ca-cert.pem

0 directories, 3 files

Now make them read only:

1
chmod 400 ~/certs/*

Now that we have that we can start writing our terraform configuration. In providers.tf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
terraform {
  required_providers {
    kubernetes = {
      source = "hashicorp/kubernetes"
      version = "2.18.0"
    }
  }
}

provider "kubernetes" {
  host                   = "https://127.0.0.1:40305"
  client_certificate     = base64decode(file("~/certs/client-cert.pem"))
  client_key             = base64decode(file("~/certs/client-key.pem"))
  cluster_ca_certificate = base64decode(file("~/certs/cluster-ca-cert.pem"))
}

Your host might look different to mine, but you can find your host endpoint in ~/.kube/config.

For a simple test we can list all our namespaces to ensure that our configuration is working. In a file called namespaces.tf, we can populate the following:

1
2
3
4
5
data "kubernetes_all_namespaces" "allns" {}

output "all-ns" {
  value = data.kubernetes_all_namespaces.allns.namespaces
}

Now we need to initialize terraform so that it can download the providers:

1
terraform init

Then we can run a plan which will reveal our namespaces:

1
2
3
4
5
6
7
8
9
10
11
12
13
terraform plan

data.kubernetes_all_namespaces.allns: Reading...
data.kubernetes_all_namespaces.allns: Read complete after 0s [id=a0ff7e83ffd7b2d9953abcac9f14370e842bdc8f126db1b65a18fd09faa3347b]

Changes to Outputs:
  + all-ns = [
      + "default",
      + "kube-node-lease",
      + "kube-public",
      + "kube-system",
      + "local-path-storage",
    ]

We can now remove our namespaces.tf as our test worked:

1
rm namespaces.tf

Helm Releases with Terraform

We will need two things, we need to consult the terraform helm release provider documentation and we also need to consult the helm chart documentation which we are interested in.

In my previous post I wrote about Everything you need to know about Helm and I used the Bitnami Nginx Helm Chart, so we will use that one again.

As we are working with helm releases, we need to configure the helm provider, I will just extend my configuration from my previous provider config in providers.tf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
terraform {
  required_providers {
    kubernetes = {
      source = "hashicorp/kubernetes"
      version = "2.18.0"
    }
    helm = {
      source = "hashicorp/helm"
      version = "2.9.0"
    }
  }
}

provider "kubernetes" {
  host                   = "https://127.0.0.1:40305"
  client_certificate     = base64decode(file("~/certs/client-cert.pem"))
  client_key             = base64decode(file("~/certs/client-key.pem"))
  cluster_ca_certificate = base64decode(file("~/certs/cluster-ca-cert.pem"))
}

provider "helm" {
  kubernetes {
    host                   = "https://127.0.0.1:40305"
    client_certificate     = base64decode(file("~/certs/client-cert.pem"))
    client_key             = base64decode(file("~/certs/client-key.pem"))
    cluster_ca_certificate = base64decode(file("~/certs/cluster-ca-cert.pem"))
  }
}

We will create three terraform files:

1
touch {main,outputs,variables}.tf

And our values yaml in helm-chart/nginx/values.yaml:

1
mkdir -p helm-chart/nginx

Then you can copy the values file from https://artifacthub.io/packages/helm/bitnami/nginx?modal=values into helm-chart/nginx/values.yaml.

In our main.tf I will use two ways to override values in our values.yaml using set and templatefile. The reason for the templatefile, is when we want to fetch a value and want to replace the content with our values file, it could be used when we retrieve a value from a data source as an example. In my example im just using a variable.

We will have the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
resource "helm_release" "nginx" {
  name             = var.release_name
  version          = var.chart_version
  namespace        = var.namespace
  create_namespace = var.create_namespace
  chart            = var.chart_name
  repository       = var.chart_repository_url
  dependency_update = true
  reuse_values      = true
  force_update      = true
  atomic              = var.atomic

  set {
    name  = "image.tag"
    value = "1.23.3-debian-11-r3"
  }

  set {
    name  = "service.type"
    value = "ClusterIP"
  }

  values = [
    templatefile("${path.module}/helm-chart/nginx/values.yaml", {
      NAME_OVERRIDE   = var.release_name
    }
  )]

}

As you can see we are referencing a NAME_OVERRIDE in our values.yaml, I have cleaned up the values file to the following:

1
2
3
4
5
6
7
nameOverride: "${NAME_OVERRIDE}"

## ref: https://hub.docker.com/r/bitnami/nginx/tags/
image:
  registry: docker.io
  repository: bitnami/nginx
  tag: 1.23.3-debian-11-r3

The NAME_OVERRIDE must be in a ${} format.

In our variables.tf we will have the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
variable "release_name" {
  type        = string
  default     = "nginx"
  description = "The name of our release."
}

variable "chart_repository_url" {
  type        = string
  default     = "https://charts.bitnami.com/bitnami"
  description = "The chart repository url."
}

variable "chart_name" {
  type        = string
  default     = "nginx"
  description = "The name of of our chart that we want to install from the repository."
}

variable "chart_version" {
  type        = string
  default     = "13.2.20"
  description = "The version of our chart."
}

variable "namespace" {
  type        = string
  default     = "apps"
  description = "The namespace where our release should be deployed into."
}

variable "create_namespace" {
  type        = bool
  default     = true
  description = "If it should create the namespace if it doesnt exist."
}

variable "atomic" {
  type        = bool
  default     = false
  description = "If it should wait until release is deployed."
}

And lastly our outputs.tf:

1
2
3
output "metadata" {
  value = helm_release.nginx.metadata
}

Now that we have all our configuration ready, we can initialize terraform:

1
terraform init

Then we can run a plan to see what terraform wants to deploy:

1
terraform plan

The plan output shows the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # helm_release.nginx will be created
  + resource "helm_release" "nginx" {
      + atomic                     = false
      + chart                      = "nginx"
      + cleanup_on_fail            = false
      + create_namespace           = true
      + dependency_update          = false
      + disable_crd_hooks          = false
      + disable_openapi_validation = false
      + disable_webhooks           = false
      + force_update               = false
      + id                         = (known after apply)
      + lint                       = false
      + manifest                   = (known after apply)
      + max_history                = 0
      + metadata                   = (known after apply)
      + name                       = "nginx"
      + namespace                  = "apps"
      + pass_credentials           = false
      + recreate_pods              = false
      + render_subchart_notes      = true
      + replace                    = false
      + repository                 = "https://charts.bitnami.com/bitnami"
      + reset_values               = false
      + reuse_values               = false
      + skip_crds                  = false
      + status                     = "deployed"
      + timeout                    = 300
      + values                     = [
          + <<-EOT
                nameOverride: "nginx"

                ## ref: https://hub.docker.com/r/bitnami/nginx/tags/
                image:
                  registry: docker.io
                  repository: bitnami/nginx
                  tag: 1.23.3-debian-11-r3
            EOT,
        ]
      + verify                     = false
      + version                    = "13.2.20"
      + wait                       = false
      + wait_for_jobs              = false

      + set {
          + name  = "image.tag"
          + value = "1.23.3-debian-11-r3"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + metadata = (known after apply)

Once we are happy with our plan, we can run a apply:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
terraform apply

Plan: 1 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + metadata = (known after apply)

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

helm_release.nginx: Creating...
helm_release.nginx: Still creating... [10s elapsed]

metadata = tolist([
  {
    "app_version" = "1.23.3"
    "chart" = "nginx"
    "name" = "nginx"
    "namespace" = "apps"
    "revision" = 1
    "values" = "{\"image\":{\"registry\":\"docker.io\",\"repository\":\"bitnami/nginx\",\"tag\":\"1.23.3-debian-11-r3\"},\"nameOverride\":\"nginx\"}"
    "version" = "13.2.20"
  },
])

Then we can verify if the pod is running:

1
2
3
kubectl get pods -n apps
NAME                    READY   STATUS    RESTARTS   AGE
nginx-59bdc6465-xdbfh   1/1     Running   0          2m35s

Importing Helm Releases into Terraform State

If you have an existing helm release that was deployed with helm and you want to transfer the ownership to terraform, you first need to write the terraform code, then import the resources into terraform state using:

1
terraform import helm_release.nginx apps/nginx

Where the last argument is <namespace>/<release-name>. Once that is imported you can run terraform plan and apply.

If you want to discover all helm releases managed by helm you can use:

1
kubectl get all -A -l app.kubernetes.io/managed-by=Helm

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Persisting Terraform Remote State in Gitlab

terraform-state-gitlab

In this tutorial we will demonstrate how to persist your terraform state in gitlab managed terraform state, using the terraform http backend.

For detailed information about this consult their documentation

What are we doing?

We will create a terraform pipeline which will run the plan step automatically and a manual step to run the apply step.

During these steps and different pipelines we need to persist our terraform state remotely so that new pipelines can read from our state what we last stored.

Gitlab offers a remote backend for our terraform state which we can use, and we will use a basic example of using the random resource.

Prerequisites

If you don’t see the “Infrastructure” menu on your left, you need to enable it at “Settings”, “General”, “Visibility”, “Project features”, “Permissions” and under “Operations”, turn on the toggle.

For more information on this see their documentation

Authentication

For this demonstration I created a token which is only scoped for this one project, for this we need a to create a token under, “Settings”, “Access Tokens”:

image

Select the api under scope:

image

Store the token name and token value as TF_USERNAME and TF_PASSWORD as a CICD variable under “Settings”, “CI/CD”, “Variables”.

Terraform Code

We will use a basic random_uuid resource for this demonstration, our main.tf:

1
2
3
4
5
6
resource "random_uuid" "uuid" {}

output "uuid" {
  value       = random_uuid.uuid.result
  sensitive   = false
}

Our providers.tf, you will notice the backend "http" {} is what is required for our gitlab remote state:

1
2
3
4
5
6
7
8
9
10
11
12
terraform {
  required_providers {
    random = {
      source = "hashicorp/random"
      version = "3.4.3"
    }
  }
  backend "http" {}
  required_version = "~> 1.3.6"
}

provider "random" {}

Push that up to gitlab for now.

Gitlab Pipeline

Our .gitlab-ci.yml consists of a plan step and a apply step which is a manual step as we first want to review our plan step before we apply.

Our pipeline will only run on the default branch, which in my case is main:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
image:
  name: hashicorp/terraform:1.3.6
  entrypoint: [""]

cache:
  paths:
    - .terraform

workflow:
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
    - when: never

variables:
  TF_ADDRESS: "https://gitlab.com/api/v4/projects/${CI_PROJECT_ID}/terraform/state/default-terraform.tfstate"

stages:
  - plan
  - apply

.terraform_init: &terraform_init
  - terraform init
      -backend-config=address=${TF_ADDRESS}
      -backend-config=lock_address=${TF_ADDRESS}/lock
      -backend-config=unlock_address=${TF_ADDRESS}/lock
      -backend-config=username=${TF_USERNAME}
      -backend-config=password=${TF_PASSWORD}
      -backend-config=lock_method=POST
      -backend-config=unlock_method=DELETE
      -backend-config=retry_wait_min=5

terraform:plan:
  stage: plan
  artifacts:
    paths:
      - '**/*.tfplan'
      - '**/.terraform.lock.hcl'
  before_script:
    - *terraform_init
  script:
    - terraform validate
    - terraform plan -input=false -out default.tfplan

terraform:apply:
  stage: apply
  artifacts:
    paths:
      - '**/*.tfplan'
      - '**/.terraform.lock.hcl'
  before_script:
    - *terraform_init
  script:
    - terraform apply -input=false -auto-approve default.tfplan
  when: manual

Where the magic happens is in the terraform init step, that is where we will initialize the terraform state in gitlab, and as you can see we are taking the TF_ADDRESS variable to define the path of our state and in this case our state file will be named default-terraform.tfstate.

If it was a case where you are deploying multiple environments, you can use something like ${ENVIRONMENT}-terraform.tfstate.

When we run our pipeline, we can look at our plan step:

image

Once we are happy with this we can run the manual step and do the apply step, then our pipeline should look like this:

image

When we inspect our terraform state in the infrastructure menu, we can see the state file was created:

image

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Everything You Need to Know About Helm

image

Helm, its one amazing piece of software that I use multiple times per day!

What is Helm?

You can think of helm as a package manager for kubernetes, but in fact its much more than that.

Think about it in the following way:

  • Kubernetes Package Manager
  • Way to templatize your applications (this is the part im super excited about)
  • Easy way to install applications to your kubernetes cluster
  • Easy way to do upgrades to your applications
  • Websites such as artifacthub.io provides a nice interface to lookup any application an how to install or upgrade that application.

How does Helm work?

Helm uses your kubernetes config to connect to your kubernetes cluster. In most cases it utilises the config defined by the KUBECONFIG environment variable, which in most cases points to ~/kube/config.

If you want to follow along, you can view the following blog post to provision a kubernetes cluster locally:

Once you have provisioned your kubernetes cluster locally, you can proceed to install helm, I will make the assumption that you are using Mac:

1
brew install helm

Once helm has been installed, you can test the installation by listing any helm releases, by running:

1
helm list

Helm Charts

Helm uses a packaging format called charts, which is a collection of files that describes a related set of kubernetes resources. A sinlge helm chart m ight be used to deploy something simple such as a deployment or something complex that deploys a deployment, ingress, horizontal pod autoscaler, etc.

Using Helm to deploy applications

So let’s assume that we have our kubernetes cluster deployed, and now we are ready to deploy some applications to kubernetes, but we are unsure on how we would do that.

Let’s assume we want to install Nginx.

First we would navigate to artifacthub.io, which is a repository that holds a bunch of helm charts and the information on how to deploy helm charts to our cluster.

Then we would search for Nginx, which would ultimately let us land on:

On this view, we have super useful information such as how to use this helm chart, the default values, etc.

Now that we have identified the chart that we want to install, we can have a look at their readme, which will indicate how to install the chart:

1
2
$ helm repo add my-repo https://charts.bitnami.com/bitnami
$ helm install my-release my-repo/nginx

But before we do that, if we think about it, we add a repository, then before we install a release, we could first find information such as the release versions, etc.

So the way I would do it, is to first add the repository:

1
$ helm repo add bitnami https://charts.bitnami.com/bitnami

Then since we have added the repository, we can update our repository to ensure that we have the latest release versions:

1
$ helm repo update

Now that we have updated our local repositories, we want to find the release versions, and we can do that by listing the repository in question. For example, if we don’t know the application name, we can search by the repository name:

1
$ helm search repo bitnami/ --versions

In this case we will get an output of all the applications that is currently being hosted by Bitnami.

If we know the repository and the release name, we can extend our search by using:

1
$ helm search repo bitnami/nginx --versions

In this case we get an output of all the Nginx release versions that is currently hosted by Bitnami.

Installing a Helm Release

Now that we have received a response from helm search repo, we can see that we have different release versions, as example:

1
2
3
NAME                             CHART VERSION   APP VERSION DESCRIPTION
bitnami/nginx                     13.2.22         1.23.3      NGINX Open Source is a web server that can be a...
bitnami/nginx                     13.2.21         1.23.3      NGINX Open Source is a web server that can be a...

For each helm chart, the chart has default values which means, when we install the helm release it will use the default values which is defined by the helm chart.

We have the concept of overriding the default values with a yaml configuration file we usually refer to values.yaml, that we can define the values that we want to override our default values with.

To get the current default values, we can use helm show values, which will look like the following:

1
$ helm show values bitnami/nginx --version 13.2.22

That will output to standard out, but we can redirect the output to a file using the following:

1
$ helm show values bitnami/nginx --version 13.2.22 > nginx-values.yaml

Now that we have redirected the output to nginx-values.yaml, we can inspect the default values using cat nginx-values.yaml, and any values that we see that we want to override, we can edit the yaml file and once we are done we can save it.

Now that we have our override values, we can install a release to our kubernetes cluster.

Let’s assume we want to install nginx to our cluster under the name my-nginx and we want to deploy it to the namespace called web-servers:

1
$ helm upgrade --install my-nginx bitnami/nginx --values nginx-values.yaml --namespace web-servers --create-namespace --version 13.2.22

In the example above, we defined the following:

  • upgrade --install - meaning we are installing a release, if already exists, do an upgrade
  • my-nginx - use the release name my-nginx
  • bitnami/nginx - use the repository and chart named nginx
  • --values nginx-values.yaml - define the values file with the overrides
  • --namespace web-servers --create-namespace - define the namespace where the release will be installed to, and create the namespace if not exists
  • --version 13.2.22 - specify the version of the chart to be installed

Information about the release

We can view information about our release by running:

1
$ helm list -n web-servers

Creating your own helm charts

It’s very common to create your own helm charts when you follow a common pattern in a microservice architecture or something else, where you only want to override specific values such as the container image, etc.

In this case we can create our own helm chart using:

1
2
3
$ mkdir ~/charts
$ cd ~/charts
$ helm create my-chart

This will create a scaffoliding project with the required information that we need to create our own helm chart. If we look at a tree view, it will look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ tree .
.
└── my-chart
    ├── Chart.yaml
    ├── charts
    ├── templates
    │   ├── NOTES.txt
    │   ├── _helpers.tpl
    │   ├── deployment.yaml
    │   ├── hpa.yaml
    │   ├── ingress.yaml
    │   ├── service.yaml
    │   ├── serviceaccount.yaml
    │   └── tests
    │       └── test-connection.yaml
    └── values.yaml

4 directories, 10 files

This example chart can already be used, to see what this chart will produce when running it with helm, we can use the helm template command:

1
2
$ cd my-chart
$ helm template example . --values values.yaml

The output will be something like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
# Source: my-chart/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-my-chart
  labels:
    helm.sh/chart: my-chart-0.1.0
    app.kubernetes.io/name: my-chart
    app.kubernetes.io/instance: example
    app.kubernetes.io/version: "1.16.0"
    app.kubernetes.io/managed-by: Helm
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: my-chart
          image: "nginx:1.16.0"
          ...
---
...

In our example it will create a service account, service, deployment, etc.

As you can see the spec.template.spec.containers[].image is set to nginx:1.16.0, and to see how that was computed, we can have a look at templates/deployment.yaml:

As you can see in image: section we have .Values.image.repository and .Values.image.tag, and those values are being retrieved from the values.yaml file, and when we look at the values.yaml file:

1
2
3
4
5
image:
  repository: nginx
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: ""

If we want to override the image repository and image tag, we can update the values.yaml file to lets say:

1
2
3
4
image:
  repository: busybox
  tag: latest
  pullPolicy: IfNotPresent

When we run our helm template command again, we can see that the computed values changed to what we want:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ helm template example . --values values.yaml
---
# Source: my-chart/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-my-chart
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: my-chart
          image: "busybox:latest"
          imagePullPolicy: IfNotPresent
      ...

Another way is to use --set:

1
2
3
4
5
6
7
8
$ helm template example . --values values.yaml --set image.repository=ruanbekker/containers,image.tag=curl
spec:
  template:
    spec:
      containers:
        - name: my-chart
          image: "ruanbekker/containers:curl"
      ...

The template subcommand provides a great way to debug your charts. To learn more about helm charts, view their documentation.

Publish your Helm Chart to ChartMuseum

ChartMuseum is an open-source Helm Chart Repository server written in Go.

Running chartmuseum demonstration will be done locally on my workstation using Docker. To run the server:

1
2
3
4
5
6
7
$ docker run --rm -it \
  -p 8080:8080 \
  -e DEBUG=1 \
  -e STORAGE=local \
  -e STORAGE_LOCAL_ROOTDIR=/charts \
  -v $(pwd)/charts:/charts \
  ghcr.io/helm/chartmuseum:v0.14.0

Now that ChartMuseum is running, we will need to install a helm plugin called helm-push which helps to push charts to our chartmusuem repository:

1
$ helm plugin install https://github.com/chartmuseum/helm-push

We can verify if our plugin was installed:

1
2
3
$ helm plugin list
NAME      VERSION DESCRIPTION
cm-push   0.10.3  Push chart package to ChartMuseum

Now we add our chartmuseum helm chart repository, which we will call cm-local:

1
$ helm repo add cm-local http://localhost:8080/

We can list our helm repository:

1
2
3
$ helm repo list
NAME                  URL
cm-local              http://localhost:8080/

Now that our helm repository has been added, we can push our helm chart to our helm chart repository. Ensure that we are in our chart repository directory, where the Chart.yaml file should be in our current directory. We need this file as it holds metadata about our chart.

We can view the Chart.yaml:

1
2
3
4
5
6
apiVersion: v2
name: my-chart
description: A Helm chart for Kubernetes
type: application
version: 0.1.0
appVersion: "1.16.0"

Push the helm chart to chartmuseum:

1
2
3
$ helm cm-push . http://localhost:8080/ --version 0.0.1
Pushing my-chart-0.0.1.tgz to http://localhost:8080/...
Done.

Now we should update our repositories so that we can get the latest changes:

1
$ helm repo update

Now we can list the charts under our repository:

1
2
3
$ helm search repo cm-local/
NAME              CHART VERSION   APP VERSION DESCRIPTION
cm-local/my-chart 0.0.1           1.16.0      A Helm chart for Kubernetes

We can now get the values for our helm chart by running:

1
$ helm show values cm-local/my-chart

This returns the values yaml that we can use for our chart, so let’s say you want to output the values yaml so that we can use to to deploy a release we can do:

1
$ helm show values cm-local/my-chart > my-values.yaml

Now when we want to deploy a release, we can do:

1
$ helm upgrade --install my-release cm-local/my-chart --values my-values.yaml --namespace test --create-namespace --version 0.0.1

After the release was deployed, we can list the releases by running:

1
$ helm list

And to view the release history:

1
$ helm history my-release

Resources

Please find the following information with regards to Helm documentation: - helm docs - helm cart template guide

If you need a kubernetes cluster and you would like to run this locally, find the following documentation in order to do that: - using kind for local kubernetes clusters

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Getting Started With Wiremock

In this tutorial we will use docker to run an instance of wiremock to setup a mock api for us to test our api’s.

Wiremock

Wiremock is a tool for building mock API’s which enables us to build stable development environments.

Docker and Wiremock

Run a wiremock instance with docker:

1
docker run -it --rm -p 8080:8080 --name wiremock wiremock/wiremock:2.34.0

Then our wiremock instance will be exposed on port 8080 locally, which we can use to make a request against to create a api mapping:

1
2
3
4
curl -XPOST -H "Content-Type: application/json" \
  http://localhost:8080/__admin/mappings
  -d '{"request": {"url": "/testapi","method": "GET"}, "response": {"status": 200, "body": "{\"result\": \"ok\"
}", "headers": {"Content-Type": "application/json"}}}'

The response should be something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
    "id" : "223a2c0a-8b43-42dc-8ba6-fe973da1e420",
    "request" : {
      "url" : "/testapi",
      "method" : "GET"
    },
    "response" : {
      "status" : 200,
      "body" : "{\"result\": \"ok\"}",
      "headers" : {
        "Content-Type" : "application/json"
      }
    },
    "uuid" : "223a2c0a-8b43-42dc-8ba6-fe973da1e420"
}

Test Wiremock

If we make a GET request against our API:

1
curl http://localhost:8080/testapi

Our response should be:

1
2
3
{
  "result": "ok"
}

Export Wiremock Mappings

We can export our mappings to a local file named stubs.json with:

1
curl -s http://localhost:8080/__admin/mappings --output stubs.json

Import Wiremock Mappings

We can import our mappings from our stubs.json file with:

1
curl -XPOST -v --data-binary @stubs.json http://localhost:8080/__admin/mappings/import

Resources

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Logging With Docker Promtail and Grafana Loki

grafana-loki-promtail

In this post we will use Grafana Promtail to collect all our logs and ship it to Grafana Loki.

About

We will be using Docker Compose and mount the docker socket to Grafana Promtail so that it is aware of all the docker events and configure it that only containers with docker labels logging=promtail needs to be enabled for logging, which will then scrape those logs and send it to Grafana Loki where we will visualize it in Grafana.

Promtail

In our promtail configuration config/promtail.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# https://grafana.com/docs/loki/latest/clients/promtail/configuration/
# https://docs.docker.com/engine/api/v1.41/#operation/ContainerList
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: flog_scrape
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
        filters:
          - name: label
            values: ["logging=promtail"]
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'logstream'
      - source_labels: ['__meta_docker_container_label_logging_jobname']
        target_label: 'job'

You can see we are using the docker_sd_configs provider and filter only docker containers with the docker labels logging=promtail and once we have those logs we relabel our labels to have the container name and we also use docker labels like log_stream and logging_jobname to add labels to our logs.

Grafana Config

We would like to auto configure our datasources for Grafana and in config/grafana-datasources.yml we have:

1
2
3
4
5
6
7
8
9
10
apiVersion: 1

datasources:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    version: 1
    editable: false
    isDefault: true

Docker Compose

Then lastly we have our docker-compose.yml that wire up all our containers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
version: '3.8'

services:
  nginx-app:
    container_name: nginx-app
    image: nginx
    labels:
      logging: "promtail"
      logging_jobname: "containerlogs"
    ports:
      - 8080:80
    networks:
      - app

  grafana:
    image: grafana/grafana:latest
    ports:
      - 3000:3000
    volumes:
      - ./config/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yaml
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_AUTH_DISABLE_LOGIN_FORM=true
    networks:
      - app

  loki:
    image: grafana/loki:latest
    ports:
      - 3100:3100
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - app

  promtail:
    image:  grafana/promtail:latest
    container_name: promtail
    volumes:
      - ./config/promtail.yaml:/etc/promtail/docker-config.yaml
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: -config.file=/etc/promtail/docker-config.yaml
    depends_on:
      - loki
    networks:
      - app

networks:
  app:
    name: app

As you can see with our nginx container we define our labels:

1
2
3
4
5
6
  nginx-app:
    container_name: nginx-app
    image: nginx
    labels:
      logging: "promtail"
      logging_jobname: "containerlogs"

Which uses logging: "promtail" to let promtail know this log container’s log to be scraped and logging_jobname: "containerlogs" which will assign containerlogs to the job label.

Start the stack

If you are following along all this configuration is available in my github repository https://github.com/ruanbekker/docker-promtail-loki .

Once you have everything in place you can start it with:

1
docker-compose up -d

Access nginx on http://localhost:8080

image

Then navigate to grafana on http://localhost:3000 and select explore on the left and select the container:

image

And you will see the logs:

image

Thank You

Thanks for reading, feel free to check out my website, feel free to subscribe to my newsletter or follow me at @ruanbekker on Twitter.