Kubernetes is an open-source container-orchestration system for automating deployment, scaling and management of containerised apps. And if you want to sound really 🆒, you should refer to it as k8s. Just in case you want to know why it's called k8s, check out this brief explanation.
k8s' job is to run, track and monitor containers at scale. It has become the de facto tool for container management. Kubernetes is the largest and fastest growing open-source container orchestration software. This blog post is the first part of a series: Kubernetes for Python developers.
Our goal is to migrate a Celery app app we developed in a previous blog post from Docker Compose to Kubernetes. You do not need any Kubernetes knowlegde to follow this blog post. You should have some experience with Docker. In this first part of the series, you will learn how to set up RabbitMQ as your Celery message broker on Kubernetes. You will learn about kubectl, the Kubernetes command line interface. And by the end of this article you will know how to deploy a self-healing RabbitMQ application with a stable IP address and DNS name into the cluster.
To run Kubernetes on your machine, make sure to enable it. You can find instructions here.
kubectl
First thing you need to know is kubectl. kubectl is the kubernetes command line tool. It is the docker-compose equivalent and lets you interact with your kubernetes cluster. For example, run kubectl cluster-info
to get basic information about your kubernetes cluster. Or kubectl logs worker
to get stdout/stderr logs. Very similar to docker-compose logs worker.
Pods
You cannot run a container directly on Kubernetes. A container must always run inside a Pod. A Pod is the smallest and most basic building block in the Kubernetes world. A Pod is an environment for a single container. Or a small number of tightly coupled containers (think log forwarding container). A Pod shares some of the properties of a Docker Compose service. A Pod specifies the docker image and command to run. It allows you to define environment variables, memory and CPU resources.
Unlike a Docker Compose service, a Pod does not provide self-healing functionality. It is ephemeral. When a Pod dies, it's gone. Nor does a Pod come with DNS capabilities. Survival, restart, network accessbility, port mapping and all that is handled by a so-called Service object. We will cover services further down. Pods are much lower level compared to Docker Compose services. Let's create a RabbitMQ Pod. We use the RabbitMQ image from Docker Hub, tag 3.7.8.
# rabbitmq-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: rabbitmq-pod
spec:
containers:
- name: rabbitmq-container
image: rabbitmq:3.7.8
Create the Pod with kubectl and confirm it is up and running.
# apply rabbitmq-pod.yaml
~$ kubectl apply -f rabbitmq-pod.yaml
pod/rabbitmq-pod created
# list pods
~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-pod 1/1 Running 0 10s
Delete the Pod and confirm it's gone.
# delete pod from rabbitmq-pod.yaml
~$ kubectl delete -f rabbitmq-pod.yaml
pod "rabbitmq-pod" deleted
# list pods
~$ kubectl get pods
No resources found.
ReplicaSets
When the container running inside the Pod dies, the Pod is gone. Pods do not self-heal. Nor do they scale. The lack of self-healing capabilities means that it is not a good idea to create a Pod directly. This is where ReplicaSets come in. A ReplicaSet ensures that a specified number of Pod replicas are running at any given time. A ReplicaSet is a management wrapper around a Pod. If a Pod, that is managed by a ReplicaSet, dies, the ReplicaSet brings up a new Pod instance.
# rabbitmq-rs.yaml
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: rabbitmq-rs
labels:
app: rabbitmq-rs
spec:
replicas: 1
selector:
matchLabels:
name: rabbitmq-pod
template:
metadata:
labels:
name: rabbitmq-pod
spec:
restartPolicy: Always
containers:
- name: rabbitmq-container
image: rabbitmq:3.7.8
Instead of a Pod yaml, we now create a ReplicaSet yaml. We define the Pod inside the .spec.template
property of the ReplicaSet yaml - which is the RabbitMQ Pod manifest from above. That is,.spec.template
has exactly the same schema as the Pod manifest. Except that it is nested and does not have the apiVersion
and kind
properties. We also rearranged the Pod's metadata slightly. We now attach the label name: rabbitmq-pod
to the RabbitMQ Pod. This matches the ReplicaSet's .spec.selector.matchLabels
selector. This means the ReplicaSet can manage the RabbitMQ Pods as the selector matches. We set the number of RabbitMQ Pods we want to run concurrently in .spec.replicas
to 1. Let's create the ReplicaSet with kubectl.
# apply rabbitmq-rs.yaml
~$ kubectl apply -f rabbitmq-rs.yaml
replicaset.apps/rabbitmq-rs created
# list replicatsets
~$ kubectl get rs
NAME DESIRED CURRENT READY AGE
rabbitmq-rs 1 1 1 5s
# list pods
~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-rs-fxdqp 1/1 Running 0 7s
Let's find out what happens when we delete the Pod rabbitmq-rs-fxdqp
.
# delete pod
~$ kubectl delete pod rabbitmq-rs-fxdqp
pod "rabbitmq-rs-fxdqp" deleted
# list pods
~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-rs-5sldl 1/1 Running 0 24s
What happened here? We deleted the ephemeral Pod rabbitmq-rs-fxdqp
. The ReplicaSet then noticed that the actual number of RabbitMQ Pods running was 0. And it created a new RabbitMQ Pod instance named rabbitmq-rs-5sldl
. We have a self-healing RabbitMQ instance. Nice. Now, let's try and delete delete the ReplicaSet.
# delete replicatset from rabbitmq-rs.yaml
~$ kubectl delete -f rabbitmq-rs.yaml
replicaset.apps "rabbitmq-rs" deleted
# list replicasets
~$ kubectl get rs
No resources found.
# list pods
~$ kubectl get pods
No resources found.
Deployments
Deploying ReplicaSet updates directly is only possible in an imperative way. It is much easier to define the desired state. This is the use case for Deployments. A Deployment provides declarative updates for ReplicaSets and Pods. Create a Deployment to create a ReplicaSet which, in turn, brings up one RabbitMQ Pod. In other words: ReplicaSets manage Pods. Deployments manage ReplicaSets.
# rabbitmq-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rabbitmq-deploy
spec:
replicas: 1
selector:
matchLabels:
name: rabbitmq-pod
template:
metadata:
labels:
name: rabbitmq-pod
spec:
restartPolicy: Always
containers:
- name: rabbitmq-container
image: rabbitmq:3.7.8
Now, let's say we need RabbitMQ with the management plugin. We need to replace rabbitmq:3.7.8
with rabbitmq:3.7.8-management
. The new Deployment manifest defines the updated desired state for rabbitmq-deploy.
# rabbitmq-management-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rabbitmq-deploy
spec:
replicas: 1
selector:
matchLabels:
name: rabbitmq-pod
template:
metadata:
labels:
name: rabbitmq-pod
spec:
restartPolicy: Always
containers:
- name: rabbitmq-container
image: rabbitmq:3.7.8-management
Deploy the new Deployment version and see how it updates the ReplicaSet and Pod.
# apply rabbitmq-management-deploy.yaml
~$ kubectl apply -f rabbitmq-management-deploy.yaml
deployment.apps/rabbitmq-deploy configured
# list pods
~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-deploy-7f86fcd959-fgtxr 1/1 Running 0 8m
rabbitmq-deploy-f98989967-qmxzn 0/1 ContainerCreating 0 2s
# list pods
~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-deploy-7f86fcd959-fgtxr 0/1 Terminating 0 8m
rabbitmq-deploy-f98989967-qmxzn 1/1 Running 0 19s
# list replicasets
~$ kubectl get rs
NAME DESIRED CURRENT READY AGE
rabbitmq-deploy-7f86fcd959 0 0 0 13m
rabbitmq-deploy-f98989967 1 1 1 1m
# get details for rabbitmq-deploy-f98989967-qmxzn pod
~$ kubectl get pod rabbitmq-deploy-f98989967-qmxzn -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2018-11-23T16:33:38Z
generateName: rabbitmq-deploy-f98989967-
labels:
name: rabbitmq-pod
pod-template-hash: "954545523"
name: rabbitmq-deploy-f98989967-qmxzn
namespace: default
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: rabbitmq-deploy-f98989967
uid: 87be145f-ef3d-11e8-886a-025000000001
resourceVersion: "594134"
selfLink: /api/v1/namespaces/default/pods/rabbitmq-deploy-f98989967-qmxzn
uid: 87c0e8ca-ef3d-11e8-886a-025000000001
spec:
containers:
- image: rabbitmq:3.7.8-management
imagePullPolicy: IfNotPresent
name: rabbitmq-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-r7js4
readOnly: true
RabbitMQ 3.7.8-management is successfully deployed, replacing RabbitMQ 3.7.8 and giving you access to the RabbitMQ management plugin. You now know how to create and deploy a self-healing RabbitMQ Kubernetes instance!
Services
We still lack a stable Pod IP address or DNS name. Remember that Pods are not durable. When a Pod dies, the ReplicaSet creates a new Pod instance. The new Pod's IP address differs from the old Pod's IP address. In order to run a Celery worker Pod, we need a stable connection to the RabbitMQ Pod.
Enter Services. A Kubernetes Service is another Kubernetes object. A service gets its own stable IP address, a stable DNS name and a stable port.Services provide service discovery, load-balancing, and features to support zero-downtime deployments. Kubernetes provides two types of Services. A ClusterIP service gives you a service inside your cluster. Your apps inside your cluster can access that service via a stable IP address, DNS name and port. A ClusterIP service does not provide access from outside the cluster. A NodePort service provides access to a Pod from outside the cluster. And everything a ClusterIP service provides.
Make the RabbitMQ Pod available inside the cluster under the service name rabbitmq and expose 5672. Expose the RabbitMQ management UI externally on port 30672.
# rabbitmq-service.yaml
apiVersion: v1
kind: Service
metadata:
name: rabbitmq
spec:
type: NodePort
selector:
name: rabbitmq-pod
ports:
- protocol: TCP
port: 15672
nodePort: 30672
targetPort: 15672
name: http
- protocol: TCP
port: 5672
targetPort: 5672
name: amqp
Deploy with kubectl and check the service's status:
# apply rabbitmq-service.yaml
~$ kubectl apply -f rabbitmq-service.yaml
service/rabbitmq created
# get details for rabbitmq service
~$ kubectl get service rabbitmq
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rabbitmq NodePort 10.105.37.247 <none> 15672:30672/TCP,5672:32610/TCP 1m
The RabbitMQ management UI is now available on http://localhost:30672
. From within the cluster, RabbitMQ is now accessible on amqp://guest:guest@rabbitmq/5672
.
Next up
In this blog post, we built the foundations for migrating our Docker Compose Celery app to Kubernetes. We set up a self-healing RabbitMQ Deployment and a RabbitMQ service that gives us a stable URL. Now that we have a stable RabbitMQ URL, we can set up our Celery worker on Kubernetes. I will cover that in the next blog post.