Container Orchestration (Docker 103)
Coding (Php 7.x)
Containerize your PHP application using Swarm and Kubernetes
A few days ago I have attended a workshop at the Google Academy in London in which were illustrate some of the characteristics of Kubernetes and the Concept of Docker Orchestration.
Since this is a very hot topic at the moment and working with containers is where the web development industry is moving towards, I thought it may be worth it share some of the basic knowledge of the workshop in here.
The process of managing and scheduling the tasks that containers that live within multiple clusters need to perform is called Container Orchestration
We have seen that when you run the docker run command with some php image you create an instance of a php image,
In case you need more than one you just run more images.
There is to check the healthiness, status and occupancy level of your applications.
If you see a problem with one of the container you should act fast and
you need to do all by yourself.
You also need to check the host itself, what if the docker Host crashes and brings all container away with it?
In the previous episodes we have managed up to 3 o 4 containers at once, a big application can have hundreds or thousands of them.
Joe Beda senior staff software engineer at Google in an interview said that his team run over two billion containers per week. with a b!
Container orchestration is a set of tool, script and practices that can help host containers in a production environment.
A usual solution is composed of multiple docker hosts that run several containers so that if an application fails others are still available.
An orchestration done right allows you to run hundreds of containers with a single command.
Some of these orchestration solutions also have an inner load balancer, in this way, they can scale up when the requests are higher and they can scale down when the requests are lower automatically.
You will read more of this feature down below.
There are different orchestration solution currently available for PHP developers.
The internal solution from Docker itself is called Docker Swarm other solutions are Kubernetes developed by Google and Mesos from Apache.
Each of these solutions has its strengths and weaknesses.
Docker Swarm, for example, is the easiest to set up but at the same time, it lacks some very useful features.
Mesos is the most advanced of the 3 but it is really difficult for a beginner to use.
The middle ground and the most popular solution at the moment is the product created by Google: Kubernetes.
In this article,
we will see in detail the main features of Docker Swarm and the ones of Kubernetes.
What you are reading is a 3-parts series about containerization of PHP application:
If you haven't read the previous parts below you can see the quick links
- Containerize your PHP application (Docker 101)
- Containerize your Laravel application (Docker 102)
- Container Orchestration (Docker 103)
The way Docker Swarm works is to create a higher level of all necessary Docker hosts and distribute services in different places.
We can load the containers by calculating the availability and load balancing between different systems and hardware.
To start the docker-swarm setup you need to have some Hosts with Docker installed in them.
The way it works is to have one host working as a Master (also called Swarm Manager) and the others working as workers or slaves.
Once you are done with the docker hosts you need to initialize the operations by running the following command to the master.
docker swarm init
When this command is executed you’ll get a response with the command that needs to run on the workers.
The command will look some similar to this:
docker swarm join --token myToken
It has to run on the workers.
Another name for the workers (after joining the master) is nodes.
If everything went well you can now create services and deploy on the swarm cluster
We have seen that to run a container you type the command docker run.
If you have a system with multiple hosts repeating the run command over and over again won’t be scalable and easy to do.
That is why Swarm allows us to create replicas that will be automatically shared among all the Docker Hosts.
Here is the command:
Docker service create --replica=5 myImageName
This has more or less the same function of the docker run command.
This was just a brief introduction to Docker Swarm,
Kubernetes was born as an internal project inside the offices of Google, they create an orchestration system initially called Borg,
This system was really effective but completely bonded to Google machines and the Google data centres.
In 2015 Google and the Linux foundation decided to rewrite a more flexible version of this system.
As we said before, Kubernetes is a bit more difficult to set up than Docker Swarm, however, once learned it has many more features than the previous one.
Kubernetes is also the most popular of the services and it is supported by most Cloud Services such as AWS, Microsoft Azure and of course Google Cloud.
Kubernetes, also called K8s uses Docker hosts to host application in the form of Containers.
The architecture of K8S
The way Kubernetes works consist of a series of nodes connected together.
A node is a working machine and it is where containers are going to be launched.
Having a system with a single note, even though entirely possible, does not make too much sense because, in case of a problem, we are going to lose it, thus losing the whole application.
Like Docker Swarn, in here we also have a special node, which is called Master responsible to watch over the nodes that are within the cluster.
When we deploy an application this set of nodes (at least one) and master (at least one) is called a cluster.
What are these nodes?
A node can be either an actual physical machine, a virtual machine or a VM on the cloud.
The nodes do not contain directly containers, instead, for designing reasoning nodes contain one or more pods,
You can consider a pod as a wrapper for containers.
The master node can manage and monitor the entire cluster so both at pods and container level.
If there is a problem with some node we can add the workload to some other nodes or add and remove nodes if needed.
K8s clusters cannot manage more than 5000 nodes, 150000 pods and 300000 containers.
I might say unless you manage a billion-dollar company you are safe to choose K8s for your system.
The master node
The master node is made of several components:
It is a high availability database, this is very different than MySql or other because it has to have the same data among all the cluster all the time.
To be able to do that it is very slow.
This component can be an internal part of the master or ar external element, it only communicates with the API Server.
It includes a set of different controllers, for example, we have the Kube-controller that manages inactive nodes and ensure the right number is available and the cloud-controller that manage the relationship with cloud infrastructures.
Is the component that let the master communicate with the rest of the system (via Json or Http API), it exposes some APIs that talk with elements like kubeclt or a user UI.
Is in charge of creating the pods, thanks to the ETCD this component knows, via the API server, the status of the cluster and check if newly created pods need to be added to nodes.
It also has information about the status of the hardware so it can allocate more if needed.
The Worker node
As written above a worker is a physical or virtual machine,
Each worker node must contain at least a container runtime, (Docker is the most common)
In order to work, every node of this type must contain the following components:
It is the parts the communicate with the master of the cluster,
It sends requests to the master’s API server.
It is also in charge of being sure the containers of the pod are running as supposed via a set of PodSpecs.
In case of issues, it tries to restart the pods.
It is the networking component of the node it will communicate and expose the services provided by the node to the network or cloud.
It also communicates with the Master node via the API server, in case there is the need to add or remove services or endpoints.
It is the software that makes a container inside the pods run, the most popular is Docker but there are other Containerd, Rktlet and Kubernetes CRI among all.
When you deploy to either an application or its configuration you have created a rollout, Kubernetes take care of any update while making sure that there is no downtime during the process.
It does that by keeping monitoring the healthiness of the elements of the system.
If K8s discover that something went or is about to go wrong it automatically rollback to the previous state.
Unfortunately, it is quite common to find out of mistakes after the changes in our rollouts took place, a way to fix this error is to rollback (restore the system) to a previous correct state.
Luckily for us, Kubernetes system does it automatically.
Automatic bin packing
During the creating of pods, the developer can specify the amount of CPU and RAM memory the container need to use.
This allows the scheduler to choose in which node allocate the pods on.
This is very important because if Kubernetes know the memory available on the servers and the one consumed by the pods it can better decide where to use it, and make the application run faster.
Kubernetes, using YAML file, can run processes that schedule the creation of one or more pods.
During the job, several verification processes take places.
If the container or the whole pod fails the job controller will schedule the creation of another node.
Jobs are powerful tools because they can run one after another to scale up an application or in parallel.
Chance are that if you are using or thinking to use Kubernetes you are managing a quite large application that needs a lot of resources,
One of the most important features or K8s is the possibility to automatically scale up or down the number of containers required.
You can do this via command line, via the dashboard (Kubernetes UI), or taking into account the CPU percentage.
That way it works is pretty straightforward,
There is a Manifest file (in YAML format), in which you can specify the number of replicas you want.
This number is verified by a controller called Replication controller, which enables the creation of multiple pods according to the number in the Manifest file.
The pods created by the controller will be monitored by another internal structure called Horizontal Pod Autoscaler.
The job performed by HPA is to scale the number of pods based on the level of selected metrics like but not only CPU utilization.
We have seen already that container and volumes run inside pods, K8s provides an IP address to each of these pods.
It also groups together all the pods that share the same set of function.
Having several pods that accomplish the same task it is easy to connect to the one that is available and fast to get rather than the ones that are already managing an intense workload.
Redirect the flow to the best suitable pods and make the application reliable is the job of the load-balancer.
This feature is quite self-explained,
To make Kubernetes a reliable tool, it has to use a lot of metrics that allow the system to check the state of the application as often as possible.
In certain cases, containers and nodes can die or fail to respond.
In these cases there are expedient that will take place.
If a container fails it is restarted automatically, if a container does not respond to health check it will be killed by the system whereas if a container it will be killed and rescheduled in another node.
There are a few types of service discovery processes,
You can use service discovery using environment variables or (most likely) you are going to use a DNS-based service discovery
As you have read in the previous episode of this series it is quite useful to store persistent data outside the container.
Just think that in case you do not need a container anymore and you delete it, if there is data inside it, it will be gone alongside the container.
That’s why there is the concept of volume.
A volume is a directory that stores data outside the container, in a safer and isolated place.
There are several types of storage that Kubernetes allow us to use, either local storage, cloud data (on AWS for instance) or Network (NFS)
This was a really basic overview of the work of Container orchestration and its main player.
These are very powerful tools that if not managed properly can break a system indefinitely (almost).
Stan Lee used to say “With great power, there must also come great responsibility” and Docker Swarn and Kubernetes are two of the most powerful tools available right now to web developers that want to manage their containers professionally.
Where can you go from here?
Contenairazion us useful if we want to create applications, to do so you actually need to know a programming language.
If you already know PHP it may be time to skill up you knowledge with the use of design patterns.