УДК 004

Kubernetes or Docker Swarm – which clustering tool to choose

Гладун Анастасия Михайловна – магистр по направлению “Прикладная математика и информатика”, ведущий программист ООО “АТОН”.

Abstract: Containerization technology enables running the application in separate and independent environments, the so–called containers. These applications simplify deployment, thereby isolating them from each other and speeding up the work process. But the large amount of containers are difficult to manage. Containers are not lightweight virtual machines, but are standardized executable packages used to deliver applications, including applications developed using a microservice-based software architecture and including all the components necessary to run applications. Orchestration systems or clustering tools are applied to deal with such challenges. The two clustering systems will be considered in the article, namely Docker Swarm and Kubernetes. Docker Swarm and Kubernetes are the most well-known platforms for container orchestration today. They both have their advantages and disadvantages, and both serve a specific purpose. In this article, both solutions of choosing Kubernetes and Docker Swarm will be reviewed.

Аннотация: Технология контейнеризации позволяет производить запуск приложения в отдельных и независимых средах, так называемых – контейнерах. Эти приложения упрощают развертывание, тем самым изолируя их друг от друга и ускоряют процесс работы. Но когда контейнеров слишком много, ими в процессе работы сложно управлять. Контейнеры не являются легковесными виртуальными машинами, а представляют собой стандартизированные исполняемые пакеты, используемые для доставки приложений, в том числе приложений, разработанных с использованием архитектуры программного обеспечения на основе микро-сервисов и включающих все компоненты, необходимые для запуска приложений. На помощь таким трудностям и приходят системы оркестровки или инструменты кластеризации. В данной статье мы рассмотрим две системы кластеризации, такие как Docker Swarm и Kubernetes. На сегодняшний день наиболее известными платформами для оркестровки контейнеров являются Docker Swarm и Kubernetes. У них обоих есть свои преимущества и недостатки, и оба они служат определенной цели. В этой статье мы рассмотрим оба варианта кластеризации выбрать kubernetes или docker swarm.

Keywords: clustering tools, Kubernetes, Docker Swarm, scaling, network model.

Ключевые слова: инструменты кластеризации, Kubernetes, Docker Swarm, масштабирование, сетевая модель.

Kubernetes is a container orchestration software commonly used to manage a large number of containers in a physical infrastructure. Kubernetes was developed by Google due to the company's expertise in managing multiple containers in production.

Figure 1. Kubernetes architecture.

The key Kubernetes features include service discovery and load balancing, storage orchestration, rollback and self-recovery for container clusters, permissions, and configuration management.

Docker Swarm mode manages Docker engine clusters locally within the Docker platform. The Docker CLI can be used to create a cluster, deploy Application Services to a cluster, and manage cluster behavior.

Figure 2. Docker Swarm architecture.

Swarm coordinates containers and assigns tasks to container groups. Swarm also tests a container operability and manages its lifecycle, provides redundancy and failure in the event of a node failure, performs rolling software updates, and scales containers up and down according to the actual load.

A comparative analysis of Docker Swarm and Kubernetes technical capabilities identifying pros and cons of both systems is given below.

Table 1. Comparing technical capabilities.

Kubernetes	Docker Swarm
Application definition
The application can be deployed by using a combination of modules, deployments, and services (or microservices). This is a group of co-located containers that is an atomic unit of deployment. A deployment can have replicas on multiple nodes. The service is external for container workloads and integrates with DNS to cycle incoming requests.	The application can be deployed as Swarm cluster services (or microservices). A multi-container application can be specified by using YAML files. Tasks (a service instance running on a node) can be distributed to data centers by using labels. Several placement settings can be used to further distribute tasks, for example, to a rack in a data center.
Application scalability designs
Each application layer is defined as a module and can be scaled when managed by the deployment specified in YAML. Scaling can be manual or automatic. Modules can be used to run vertically integrated application stacks such as LAMP (Apache, MySQL, PHP) or ELK/Elastic (Elasticsearch, Logstash, Kibana), collaborative and jointly managed applications such as content management systems and backup applications, checkpoint, compression, rotation, snapshot.	Services can be scaled by using Docker Compose YAML templates. Services can be global or replicable. Global services run on all nodes, replicated services run service replicas (tasks) on all nodes. For example, a 3-replica MySQL service will run on up to 3 nodes. Tasks can be enlarged or decreased, deployed in parallel or sequentially.
High availability
Deployments distribute modules across the nodes for high accessibility, thereby application failures occur. Load balancing services detect faulty modules and remove them. Kubernetes high availability is supported. The load of several main and production nodes can be balanced for requests from kubectl and clients. Etcd can be clustered and API servers can be replicated.	Services can be replicated between Swarm nodes. Swarm managers are responsible for the entire cluster and manage production node resources. Managers use incoming load balancing to provide services from outside. Swarm managers use the Raft Consensus algorithm to ensure they have consistent status information. An odd number of managers are recommended and most managers should be available for a functioning Swarm cluster (2 of 3, 3 of 5, etc.).
Networks used
A network model is a flat network that communicates modules with each other. Network policies determine how pods communicate with each other. A flat network is usually implemented as an overlay. The model requires two CIDRs: the first is for pods to get an IP address, another one - for services.	A node joining the Docker Swarm cluster creates an overlay network for services that cover all hosts in Swarm, and a Docker bridge network for container hosts only. By default, the Swarm cluster nodes encrypt overlay control traffic. Users can choose to encrypt container data traffic when creating an overlay network themselves.

Unlike Kubernetes, Docker Swarm does not come with a ready-made web interface for application deployment and container orchestration. However, with its growing popularity, several third-party tools are currently available to offer simple and multifunctional graphical interfaces for Docker Swarm. Well-known Docker Swarm user interface tools include: Portiner; Docking station; Swarm Pit; Shipyard.

Notably, both methods use the kubeadm and a multi-Master approach to maintain high availability by supporting the etcd cluster nodes either outside or inside the control plane.

Unlike Kubernetes, Docker Swarm does not offer a ready-made monitoring solution. As a result, you have to rely on third-party applications to support Docker Swarm monitoring. Generally, Docker Swarm monitoring is considered more complex due to its vast amount of inter-node entities and services compared to that of the K8s cluster’s.

The objectives that Kubernetes and Docker Swarm have are really similar. But, as already noted, there are fundamental differences in how they run. Ultimately, both solutions solve complex problems to make digital transformation realistic and effective.

When implementing complex diverse IT systems, virtualization using a microservice architecture is increasingly preferred. This is the youngest form of the service-focused architecture that has become widespread over the last 3 years. According to DZone, 53% of enterprises use microservices for development or production [1]. This implementation form has a number of fundamental problems related to information security, in particular the fault tolerance of systems that have not yet been studied and pose a threat to microservice operation. In the article Methodology for designing microservice-based architecture fault-tolerant systems, the authors found that one of the main reasons why difficulties in these architecture operations occur is the lack of uniform formalized requirements to ensure fault tolerance [1].

In this regard, the authors developed a methodology for designing microservice-based fault-tolerant systems. It can be implemented in several ways depending on the information system characteristics, personnel availability and qualifications they have, and internal goals and tasks.

According to the methodology, in order to run uninterruptedly and optimally, the system must meet the following requirements:

Replication.
Reduced time for overhead operations.
Resource management c
Component status monitoring.
Reduced failure recovery time.
Automated system management, decrease in the role of a human to set up and recover operability [2].

The following criteria were selected as the criteria for evaluating the implementation and effectiveness of the method:

degree of system control automation;
system recovery time;
service recovery time;
service group recovery time;
server failure recovery time;
incident notification time.

To meet the above mentioned requirements, the system shall consist of a microservice orchestration subsystem, a log collection subsystem, a status monitoring subsystem, a data storage subsystem, a kernel.

The Kubernetes framework was chosen as a subsystem for orchestrating services. Kubernetes is not bound to a hardware infrastructure and represents the entire data center as a single computing resource. It deploys and runs software with no need to know the server details. When deploying a multi-component application, Kubernetes independently selects a server for each component, installs the component, and provides its visibility and connectivity to other application components [2]. One of the basic Kubernetes concepts is a node - the physical system node on which various elements are located. Implementing the given system was chosen to test the system implies using three expandable master nodes, three expandable main nodes and three expandable etcd servers.

The master nodes contain the system services required for Kubernetes running. They monitor the status of all components located on the main nodes.

The main system part is on the main nodes - containers (the image and the command being run in it) with microservices. Kubernetes aggregation of containers with common characteristics (IP address, network, shared data stores, labels, etc.) is referred to as pods. If a pod stops responding to requests from system services located on the master node, it is restarted. If the entire node stops responding, it takes a minute for all services located on it to get restarted on another node.

Figure 3. High-availability cluster test implementation.

Having tested the test system operation, it was determined that at least two operating etcd servers are required for the correct cluster operation. They are a distributed key-value data storage and are used as a database for Kubernetes.

Test systems for monitoring and centralized log collection were implemented to ensure monitoring of the system component condition and reduce man-hour costs of troubleshooting.

The monitoring system is based on the Grafana subsystem used for graphical display and InIIuxDB as a monitoring data storage.

InIIuxDB is a schema-free database for storing open source temporary records with optional closed source components developed by Ini1uxData. Currently, InIIuxDB uses a built-in data structure for storage: a temporary structured merge tree (TSM-tree). According to the research the employees of the Brussels Open University held, the given format is not subject to deleting problems, which provides 45 times more efficient disk compression compared to the B + tree: [3].

Grafana enables requesting, visualizing, sending alerts and accepting metrics regardless of where they are stored. This subsystem enables optimal management of the resources, users and configuration in large distributed systems that unite several large departments or separate companies.

To collect metrics, a ready-made Heapster solution was chosen, which collects and interprets various data, such as the use of computing resources, life cycle events, etc. This solution is integrated into Kubernetes. In addition to system information, it collects data on the services operating in Kubernetes. It sends all the collected metrics to IniIuxDB. Using the metrics stored in InIIuxDB, various graphs in Grafana can be built, where critical values notifying the administrator should be set. Notification can be realized in different ways, including notifying through the messenger, which allows quickly responding to a security incident.

The following subsystems were selected for the centralized logging system in the test environment: Kibana to display incoming events and Elasticsearch as a logging record storage, as well as Fluentd as a log aggregator.

The Elasticsearch subsystem performs full-text search and analysis of logging records under conditions of regular use of distributed calculations and distributed data storage.

Bharvi Dixit noted that Elasticsearch is based on REST architecture and allows performing not only CRUD operations via HTTP, but also monitor a cluster by using REST API [5]. It is designed for horizontal rather than vertical expansion. A system design can start with an Elasticsearch cluster having a single node on a laptop and can be transformed into hundreds of thousands of nodes with no need to worry about the system complexity related to distributed computing, distributed document storage and search.

The Kibana subsystem is a web application that represents data processed by Elasticsearch. It does not load data from Elasticsearch for further processing, but uses the Elasticsearch capacity to perform all resource-intensive tasks. Thus, displaying real-time information is ensured. As the amount of data grows, the Elasticsearch cluster is scaled so that minimal latency is provided, according to SLA [3].

The first step in logging system configuration was to start and configure the Fluentd service. It is configured to collect logs from all containers running on nodes and send them to Elasticsearch.

The source for log collecting is the "/var/log/containers/" folder, where all microservices write logs to.

Then Elasticsearch is launched, in it indexes (storages) are automatically created daily, in which all logs collected by Fluentd come, as well as sent manually from microservices. These logs can be filtered by indices or any parameters in Kibana and viewed in a user-friendly way.

To meet the data replication requirement, a database cluster (a set of databases managed by one instance of a running server) was implemented on master-slave PostgreSQL servers in the following form: several virtual machines, one of which is considered active, while the rest passively copy data to replace the main one in case of failure.

The experimental cluster prototype was implemented as follows:

Installing the postgresql database software, a repmgr replication manager, a pgbouncer connection pool management software.
Setting up an ssh connection with no password used for authentication between all nodes and to the server through the postgres user.
Configuring the environment (Linux PATH variable to specify the path to the postgres executable file, PGDATA variable to specify the path to configuration files and postgresql data).
For the wizard:

specifying a parameter to accept requests from the required IP addresses;
indicating a hot_standby logging level; thus, sufficient information is logged to recover transactions while ensuring sufficient cluster performance and no information overload;
enabling archiving to transfer the completed segments of the write-ahead logging (WAL) to the storage;
disabling WAL segment storage (setting the value to 0 in the configuration file);
setting the required number of replication slots supported by the server (3 slots were selected for the test system);
enabling connection to the server to send SQL queries during recovery;
setting the maximum number of WAL transmission processes operating simultaneously and the maximum number of connections;
setting a command to archive the completed WAL segment, port, and additional parameters;
setting up trusted networks for data transfer in the pg_hba.conf file;
creating a repmgr role with superuser privileges with no privilege inheritance (that is, the role will not inherit the privileges of the roles to which it is related);
enabling repmgr to log on to the server (the role can become the initial authorized login name when the client is connected);
creating a database for repmgr, giving this user all permissions to the created database;
configuring the cluster name;
configuring the node number, name, and the replication slot for the wizard;
setting up connection information and specifying the repmgr path to the postgresql executable file;
registering the server as master.

For the slave:

configuring the cluster name;
configuring the node number, name, and the replication slot for the wizard;
setting up connection information and specifying the repmgr path to the postgresql executable file;
registering the standby server and registering with the cluster.

The methodology testing results are given in the Table below:

Table 2. The methodology testing results.

Indicator	Before	After
Level of automation in system control	Not automated	Automated
System recovery time	~5-10 min	~2 min
Server recovery	~3-7 min	~0.5 – 1.5 min
Server group recovery time	(3+0.4 * number of services) min	~0.5 – 1.5 min
Server failure recovery time	~4-6 min	~1-2 min
Incident notification time	-	~15 - 20 sec

The test system implementation showed the effectiveness of the system configuration approach using a microservice architecture. The resulting system demonstrated acceptable fault tolerance automatically recovering the service within 40-60 seconds and the entire architecture in 1.5 minutes, which is 3-7 times faster than the original system before the method was applied. The logging and monitoring subsystems timely notified of incidents and enabled identifying and eliminating the fault cause, if required. Timely notifications reduced troubleshooting time to 15-20 seconds (opposing to the unmeasured time required for the user or the employee to actually detect the problem). The method was tested for architectures on 10, 20 and 30 microservices. (See Table 2).

The approach demonstrated a significant reduction in the time and human resources spent on the system maintenance, as well as applicability to any microservice architectures, regardless of the scale and number of services. The notification system timely reported the faults and the need to involve a qualified employee. All requirements to define the system as fault tolerant have been met. In the future, it is planned to improve it and adapt by using other orchestration systems.

Conclusion

Two major clustering tools, Kubernetes and Docker Swarm were considered in the article. Docker Swarm is a simple, easy-to-use orchestration tool, but it is limited in capabilities compared to Kubernetes. In contrast, the Kubernetes is a complex but having high capacity tool. It provides self-recovery and out of the box auto-scaling capabilities.

Ultimately, the platform to be chosen largely depends on the business need specifics.

References

Zhu C., Han B., Zhao Y. A comparative study on spark performance in kubernetes // The Journal of Supercomputing. 2022.
Leena Sri R., Vetriveeran D. Kubernetes for fog computing - limitations and research scope. // Lecture Notes in Networks and Systems. 2022. Т. 419 LNNS. pp. 351-361.
Shulyak A.V. Scaling resources using Kubernetes, Docker Swarm. // Young Scientist. 2022. No. 31 (426). pp. 8-13.

Интересная статья? Поделись ей с другими: