Setup a 5 Node Highly Available Elasticsearch Cluster
This is post 1 of my big collection of elasticsearch-tutorials which includes, setup, index, management, searching, etc. More details at the bottom.
In this tutorial we will setup a 5 node highly available elasticsearch cluster that will consist of 3 Elasticsearch Master Nodes and 2 Elasticsearch Data Nodes.
Master Nodes: Master nodes are responsible for Cluster related tasks, creating / deleting indexes, tracking of nodes, allocate shards to nodes, etc.
Data Nodes: Data nodes are responsible for hosting the actual shards that has the indexed data also handles data related operations like CRUD, search, and aggregations.
For more concepts of Elasticsearch, have a look at their basic-concepts documentation.
es-data-1: 10GB assigned to /dev/vdb
es-data-2: 10GB assigned to /dev/vdb
Authentication:
Note that I have configured the bind address for elasticsearch to 0.0.0.0 using network.host: 0.0.0.0 for this demonstration, but this means that if your server has a public ip address with no firewall rules or no auth, that anyone will be able to interact with your cluster.
This address will also be reachable for all nodes to see each other.
It’s advisable do protect your endpoint, either with basic auth using nginx which can be found in the embedded link, or using firewall rules to protect communication from the outside (depending on your setup)
Setup the Elasticsearch Master Nodes
The setup below how to provision a elasticsearch master node. Repeat this on node: es-master-1, es-master-2, es-master-3
Set your hosts file for name resolution (if you don’t have private dns in place):
The elasticsearch config, before we get to the full example config, I just want to show a snippet of how you could split up logs and data.
Note that you can seperate your logging between data/logs like this:
123456
# example of log splitting:
...
path:
logs: /var/log/elasticsearch
data: /var/data/elasticsearch
...
Also, your data can be divided between paths:
12345678
# example of data paths:
...
path:
data:
- /mnt/elasticsearch_1
- /mnt/elasticsearch_2
- /mnt/elasticsearch_3
...
Bootstrap the elasticsearch config with a cluster name (all the nodes should have the same cluster name), set the nodes as master node.master: true disable the node.data and specify that the cluster should at least have a minimum of 2 master nodes before it stops. This is used to prevent split brain.
To avoid a split brain, this setting should be set to a quorum of master-eligible nodes:
(master_eligible_nodes / 2) + 1
Ensure that pages are not swapped out to disk by requesting the JVM to lock the heap in memory by setting LimitMEMLOCK=infinity. Set the maxiumim file descriptor number for this process: LimitNOFILE and increase the number of threads using LimitNPROC:
1234567
$ vim /usr/lib/systemd/system/elasticsearch.service
[Service]
LimitMEMLOCK=infinity
LimitNOFILE=65535
LimitNPROC=4096
...
Increase the limit on the number of open files descriptors to user elasticsearch of 65536 or higher
Have a look at the nodes, you will see that the node.role for now shows mi:
12345
$ curl http://127.0.0.1:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.163.68.8 11 80 18 0.28 0.14 0.09 mi - es-master-2
10.163.68.5 14 80 14 0.27 0.18 0.11 mi * es-master-1
10.163.68.4 15 79 6 0.62 0.47 0.18 mi - es-master-3
Setup the Elasticsearch Data Nodes
Now that we have our 3 elasticsearch master nodes running, its time to provision the 2 elasticsearch data nodes. This setup needs to be repeated on both es-data-1 and es-data-2.
Since we attached an extra disk to our data nodes, verify that you can see the disk:
12345
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 25G 0 disk
└─vda1 253:1 0 25G 0 part /
vdb 253:16 0 10G 0 disk <----
Provision the block device with xfs or anything else that you prefer, create the directory where elasticsearch data will reside, change the ownership that elasticsearch has permission to write/read, set the device on startup and mount the disk:
Bootstrap the elasticsearch config with a cluster name, set the node.name to an identifier, in this case I will use the servers hostname, set the node.master to false as this will be data nodes, also enable these nodes as data nodes: node.data: true, configure the path.data: /data to the path that we configured, etc:
Reload the systemd daemon, enable and start elasticsearch. Allow it to start and check if the ports are listening with netstat -tulpn | grep 9200, then:
Let’s ingest some data into elasticsearch, we will create an index named first-index with some dummy data about people, username, name, surname, location and hobbies:
Now that we ingested our data into elasticsearch, lets have a look at the Indices API, where the number of documents, size etc should reflect:
123
$ curl http://127.0.0.1:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open first-index 1o6yM7tCSqagqoeihKM7_g 5 1 3 0 40.6kb 20.3kb
Now lets request a search, which will give you by default 10 returned documents:
Let’s have a look at our shards using the Shards API, you will also see where each document is assigned to a specific shard, and also if its a primary or replica shard:
123456789101112
$ curl http://127.0.0.1:9200/_cat/shards?v
index shard prirep state docs store ip node
first-index 4 p STARTED 0 230b 10.163.68.7 es-data-2
first-index 4 r STARTED 0 230b 10.163.68.11 es-data-1
first-index 2 p STARTED 0 230b 10.163.68.7 es-data-2
first-index 2 r STARTED 0 230b 10.163.68.11 es-data-1
first-index 3 r STARTED 1 6.6kb 10.163.68.7 es-data-2
first-index 3 p STARTED 1 6.6kb 10.163.68.11 es-data-1
first-index 1 r STARTED 2 13kb 10.163.68.7 es-data-2
first-index 1 p STARTED 2 13kb 10.163.68.11 es-data-1
first-index 0 p STARTED 0 230b 10.163.68.7 es-data-2
first-index 0 r STARTED 0 230b 10.163.68.11 es-data-1
Then we can also use the Allocation API to see the size of our indices, disk space per node:
As I finish up the writing of these posts they will be published under the #elasticsearch-tutorials category on my blog and for any other elasticsearch tutorials, you can find them under the #elasticsearch category.