Leader Election

May 24, 2021 | rakesh | distributed systems, engineering

When working in distributed environments, sometimes you need to have a guarantee that you will always pick the same node Or only one node performing a special task, regardless of the cluster topology changes. Such nodes are usually called Leaders. All nodes are configured in the same way, leader election takes place at the run time. In case of leader election failure, remaining nodes again start the leader election process and choose the new leader.
In this blog article, I will try to describe what is leader election, its need, and how we can use consul for leader election.

Why Use Leader

A typical cloud application has many tasks acting in a coordinated manner. These tasks could all be instances running the same code and requiring access to the same resources, or they might be working together in parallel to perform the individual parts of a complex calculation.The task instances might run separately for much of the time, but it might also be necessary to coordinate the actions of each instance to ensure that they don’t conflict, cause contention for shared resources, or accidentally interfere with the work that other task instances are performing. For example, some potential use cases where we can use a leader:

Task Scheduling
Queue Consumer
Partitioning
Task Coordination

Leader Election Methods and Algorithms

Election algorithms choose a service instance from group of service instances to act as leader or coordinator to perform special tasks. If Leader crashed due to any reason then election algorithm re-triggered and new leader elected from instances.

Here I would try to describe some well know algorithms and techniques for leader Election.

Bully and Ring Algorithm
Distributed coordination Services

1. Bully and Ring Algorithm

These Election algorithms assume that every active instance in the service cluster has a unique priority number and they are interconnected. The service instance with the highest priority will be chosen as a new Leader/Coordinator. Hence, when a leader fails, this algorithm elects that active instance that has the highest priority number. Then this number is sent to every active instance in the distributed system to let the other instances know about the new leader.
Here are some links that describe these algorithms.

2. Distributed Coordination Services

Distributed Coordination service which provides reliable services to elect a leader for the distributed system. These systems simply work like Synchronous keyword in JAVA, the thread which gets lock only that can execute the critical section, on the same line instance which get ownership on coordination key treated as Leader. There are multiple services/tools which can be used for leader election as Distributed coordination service. Few enlisted below.
• Consul
• Zookeeper
• Database
• Redis
• Random Walks
Let’s focus on, how to use consul for Leader Election as part of this post.

How to use Consul as Coordination Service for Leader Election

A consul is a networking tool that provides a fully-featured service-mesh control plane, service discovery, configuration, and segmentation, for more information about consul please visit. On top of these core functionalities Consul also provides a session mechanism that can be used to build distributed locks. Sessions act as a binding layer between nodes, health checks, and key/value data(key/value store, useful for storing service configuration or other metadata). Sessions are designed to provide granular locking.
Imagine we have a set of service instances that are attempting to acquire leadership for a given service. All service instances that are participating should agree on a common key to coordinate. Let’s consider following kv store path on consul as the coordination key services/dev/<service name>/leader

High Level Architecture

Step 1, Create Session: The first step is to create a session using the Session HTTP API. See the following sample API call to the consul to create sessionId.

$ curl -X PUT -d '{"Name": "my_service_name"}' http://localhost:8500/v1/session/create
{ "ID": "4ca8e74b-6350-7587-addf-a18084928f3c" }

Here, API call responded with session Id : 4ca8e74b-6350-7587-addf-a18084928f3c and this session-Id will be used for further actions.

Step2– Acquire Session Lock: Next step is to acquire lock on kv path for this instance using session created in previous step. Have look on following sample API call.Here we may send current instance/node information(whatever information required to communicate with leader) as body content. It my be JSON(instance host, port and unique id ), this information will be nothing but the value of our coordination key.

$ curl -X PUT -d <body> http://localhost:8500/v1/kv/services/dev/my_service_name/leader?acquire=4ca8e74b-6350-7587-addf-a18084928f3c
Sample Body =>
{
 "serviceName": "my-service",
 "nodeId" : "<unique-id-for-service-instance>",
 "host" : "198.99.93.9",
 "port" : 8080
}

Either this will return true or false. If true, then lock has been acquired and the calling instance is Leader. But if false, then some other instance might have acquired a lock and that instance is the leader.
Here instance will not only get a lock but write body content of this API call i.e. calling instance information(instance host, port and unique id ), So anytime if we wish to know who is the leader then that can be achieved by reading coordination KV store value which will nothing but locking instance information.

Step3 – Watch the Session: Next important step is to watch for any changes on the coordination key because the lock might have been released or an instance in which the acquired lock might be failed/crashed. Here leader also has to keep watch on the coordination key because its session might expire by the operator. Watching for KV store changes is done via a blocking query against. If they ever notice that the Session of is blank, there is no leader, and then should retry lock acquisition
Give up leadership: If a leader ever wanted to give up leadership then it can be done by releasing the lock on the coordination key.

$ curl -X PUT http://localhost:8500/v1/kv/services/dev/my_service_name/leader?release=4ca8e74b-6350-7587-addf-a18084928f3c

To Know who is Leader: Non-leader instances may wish to know who is leader. Instance can found that by reading coordination key(consul key path). If it has value with session that means there is leader. Read KV value which is nothing but a node’s information which we written as part of step2 request of acquiring lock on coordination key.

$ curl -X GET http://localhost:8500/v1/kv/services/dev/my_service_name/leader
{
    "LockIndex": 6,
    "Key": "service/dm/core/dev/<service name>/leader",
    "Flags": 0,
    "Value": "eyJzZXJ2aWNlTmFtZSI6ImxlYWRlcmVsZWN0aW9udGVzdDIiLCJwb3J0Ijo3Nzc3LCJhZGRyZXNzIjoicHVuLW1vcmVyLW0iLCJub2RlSWQiOiJmMzMwZjRhZS1hYWM4LTRmYTEtYWNjYi1hZjk1ZGNmZGY0MzUifQ==",
    "Session": "4ca8e74b-6350-7587-addf-a18084928f3c",
    "CreateIndex": 26863,
    "ModifyIndex": 27400
}

Remember here value is BASE64 encoded, either you can decode it or we can get raw value by appending “raw” parameter to rest call to get decoded value from KV store.

curl -X GET http://localhost:8500/v1/kv/service/dm/core/dev/<service name>/leader?raw
{ "serviceName": "my-service", "nodeId" : "<unique-id-for-service-instance>", "host" : "198.99.93.9", "port" : 8080 }

Consul is a good option to implement Leader election in services if it is already part of your infrastructure considering its scalability and reliability. Consul session API are simple to use and readymade libraries for Consul APIs are available, have a look at consul libraries and SDKs.
Hope this article will help you understand what is Leader and Leader Election.

RakeshMore

Leader Election

Why Use Leader

Leader Election Methods and Algorithms

1. Bully and Ring Algorithm

2. Distributed Coordination Services

How to use Consul as Coordination Service for Leader Election

High Level Architecture

1 Comment

Leave a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta