Kubernetes for Celery Developers

Kubernetes is an open-source container-orchestration system for automating deployment, scaling and management of containerised apps. And if you want to sound really 🆒, you should refer to it as k8s. Just in case you want to know why it's called k8s, check out this brief explanation.

k8s' job is to run, track and monitor containers at scale. It has become the de facto tool for container management. Kubernetes is the largest and fastest growing open-source container orchestration software. This blog post is the first part of a series: Kubernetes for Python developers.

Our goal is to migrate a Celery app app we developed in a previous blog post from Docker Compose to Kubernetes. You do not need any Kubernetes knowlegde to follow this blog post. You should have some experience with Docker. In this first part of the series, you will learn how to set up RabbitMQ as your Celery message broker on Kubernetes. You will learn about kubectl, the Kubernetes command line interface. And by the end of this article you will know how to deploy a self-healing RabbitMQ application with a stable IP address and DNS name into the cluster.

To run Kubernetes on your machine, make sure to enable it. You can find instructions here.

screenshot

kubectl

First thing you need to know is kubectl. kubectl is the kubernetes command line tool. It is the docker-compose equivalent and lets you interact with your kubernetes cluster. For example, run kubectl cluster-info to get basic information about your kubernetes cluster. Or kubectl logs worker to get stdout/stderr logs. Very similar to docker-compose logs worker.

screenshot

Pods

You cannot run a container directly on Kubernetes. A container must always run inside a Pod. A Pod is the smallest and most basic building block in the Kubernetes world. A Pod is an environment for a single container. Or a small number of tightly coupled containers (think log forwarding container). A Pod shares some of the properties of a Docker Compose service. A Pod specifies the docker image and command to run. It allows you to define environment variables, memory and CPU resources.

Unlike a Docker Compose service, a Pod does not provide self-healing functionality. It is ephemeral. When a Pod dies, it's gone. Nor does a Pod come with DNS capabilities. Survival, restart, network accessbility, port mapping and all that is handled by a so-called Service object. We will cover services further down. Pods are much lower level compared to Docker Compose services. Let's create a RabbitMQ Pod. We use the RabbitMQ image from Docker Hub, tag 3.7.8.

# rabbitmq-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: rabbitmq-pod
spec:
  containers:
    - name: rabbitmq-container
      image: rabbitmq:3.7.8

Create the Pod with kubectl and confirm it is up and running.

# apply rabbitmq-pod.yaml
~$ kubectl apply -f rabbitmq-pod.yaml
pod/rabbitmq-pod created

# list pods
~$ kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
rabbitmq-pod                    1/1     Running   0          10s

Delete the Pod and confirm it's gone.

# delete pod from rabbitmq-pod.yaml
~$ kubectl delete -f rabbitmq-pod.yaml
pod "rabbitmq-pod" deleted

# list pods
~$ kubectl get pods
No resources found.

ReplicaSets

When the container running inside the Pod dies, the Pod is gone. Pods do not self-heal. Nor do they scale. The lack of self-healing capabilities means that it is not a good idea to create a Pod directly. This is where ReplicaSets come in. A ReplicaSet ensures that a specified number of Pod replicas are running at any given time. A ReplicaSet is a management wrapper around a Pod. If a Pod, that is managed by a ReplicaSet, dies, the ReplicaSet brings up a new Pod instance.

# rabbitmq-rs.yaml
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: rabbitmq-rs
  labels:
    app: rabbitmq-rs
spec:
  replicas: 1
  selector:
    matchLabels:
      name: rabbitmq-pod
  template:
    metadata:
      labels:
        name: rabbitmq-pod
    spec:
      restartPolicy: Always
      containers:
        - name: rabbitmq-container
          image: rabbitmq:3.7.8

Instead of a Pod yaml, we now create a ReplicaSet yaml. We define the Pod inside the .spec.template property of the ReplicaSet yaml - which is the RabbitMQ Pod manifest from above. That is,.spec.template has exactly the same schema as the Pod manifest. Except that it is nested and does not have the apiVersion and kind properties. We also rearranged the Pod's metadata slightly. We now attach the label name: rabbitmq-pod to the RabbitMQ Pod. This matches the ReplicaSet's .spec.selector.matchLabels selector. This means the ReplicaSet can manage the RabbitMQ Pods as the selector matches. We set the number of RabbitMQ Pods we want to run concurrently in .spec.replicas to 1. Let's create the ReplicaSet with kubectl.

# apply rabbitmq-rs.yaml
~$ kubectl apply -f rabbitmq-rs.yaml
replicaset.apps/rabbitmq-rs created

# list replicatsets
~$ kubectl get rs
NAME          DESIRED   CURRENT   READY   AGE
rabbitmq-rs   1         1         1       5s

# list pods
~$ kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
rabbitmq-rs-fxdqp   1/1     Running   0          7s

Let's find out what happens when we delete the Pod rabbitmq-rs-fxdqp.

# delete pod
~$ kubectl delete pod rabbitmq-rs-fxdqp
pod "rabbitmq-rs-fxdqp" deleted

# list pods
~$ kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
rabbitmq-rs-5sldl   1/1     Running   0          24s

What happened here? We deleted the ephemeral Pod rabbitmq-rs-fxdqp. The ReplicaSet then noticed that the actual number of RabbitMQ Pods running was 0. And it created a new RabbitMQ Pod instance named rabbitmq-rs-5sldl. We have a self-healing RabbitMQ instance. Nice. Now, let's try and delete delete the ReplicaSet.

# delete replicatset from rabbitmq-rs.yaml
~$ kubectl delete -f rabbitmq-rs.yaml
replicaset.apps "rabbitmq-rs" deleted

# list replicasets
~$ kubectl get rs
No resources found.

# list pods
~$ kubectl get pods
No resources found.

Deployments

Deploying ReplicaSet updates directly is only possible in an imperative way. It is much easier to define the desired state. This is the use case for Deployments. A Deployment provides declarative updates for ReplicaSets and Pods. Create a Deployment to create a ReplicaSet which, in turn, brings up one RabbitMQ Pod. In other words: ReplicaSets manage Pods. Deployments manage ReplicaSets.

# rabbitmq-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      name: rabbitmq-pod
  template:
    metadata:
      labels:
        name: rabbitmq-pod
    spec:
      restartPolicy: Always
      containers:
        - name: rabbitmq-container
          image: rabbitmq:3.7.8

Now, let's say we need RabbitMQ with the management plugin. We need to replace rabbitmq:3.7.8 with rabbitmq:3.7.8-management. The new Deployment manifest defines the updated desired state for rabbitmq-deploy.

# rabbitmq-management-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      name: rabbitmq-pod
  template:
    metadata:
      labels:
        name: rabbitmq-pod
    spec:
      restartPolicy: Always
      containers:
        - name: rabbitmq-container
          image: rabbitmq:3.7.8-management

Deploy the new Deployment version and see how it updates the ReplicaSet and Pod.

# apply rabbitmq-management-deploy.yaml
~$ kubectl apply -f rabbitmq-management-deploy.yaml
deployment.apps/rabbitmq-deploy configured

# list pods
~$ kubectl get pods
NAME                               READY   STATUS              RESTARTS   AGE
rabbitmq-deploy-7f86fcd959-fgtxr   1/1     Running             0          8m
rabbitmq-deploy-f98989967-qmxzn    0/1     ContainerCreating   0          2s

# list pods
~$ kubectl get pods
NAME                               READY   STATUS        RESTARTS   AGE
rabbitmq-deploy-7f86fcd959-fgtxr   0/1     Terminating   0          8m
rabbitmq-deploy-f98989967-qmxzn    1/1     Running       0          19s

# list replicasets
~$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
rabbitmq-deploy-7f86fcd959   0         0         0       13m
rabbitmq-deploy-f98989967    1         1         1       1m

# get details for rabbitmq-deploy-f98989967-qmxzn pod
~$ kubectl get pod rabbitmq-deploy-f98989967-qmxzn -o yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-11-23T16:33:38Z
  generateName: rabbitmq-deploy-f98989967-
  labels:
    name: rabbitmq-pod
    pod-template-hash: "954545523"
  name: rabbitmq-deploy-f98989967-qmxzn
  namespace: default
  ownerReferences:
  - apiVersion: extensions/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: rabbitmq-deploy-f98989967
    uid: 87be145f-ef3d-11e8-886a-025000000001
  resourceVersion: "594134"
  selfLink: /api/v1/namespaces/default/pods/rabbitmq-deploy-f98989967-qmxzn
  uid: 87c0e8ca-ef3d-11e8-886a-025000000001
spec:
  containers:
  - image: rabbitmq:3.7.8-management
    imagePullPolicy: IfNotPresent
    name: rabbitmq-container
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-r7js4
      readOnly: true

RabbitMQ 3.7.8-management is successfully deployed, replacing RabbitMQ 3.7.8 and giving you access to the RabbitMQ management plugin. You now know how to create and deploy a self-healing RabbitMQ Kubernetes instance!

Services

We still lack a stable Pod IP address or DNS name. Remember that Pods are not durable. When a Pod dies, the ReplicaSet creates a new Pod instance. The new Pod's IP address differs from the old Pod's IP address. In order to run a Celery worker Pod, we need a stable connection to the RabbitMQ Pod.

Enter Services. A Kubernetes Service is another Kubernetes object. A service gets its own stable IP address, a stable DNS name and a stable port.Services provide service discovery, load-balancing, and features to support zero-downtime deployments. Kubernetes provides two types of Services. A ClusterIP service gives you a service inside your cluster. Your apps inside your cluster can access that service via a stable IP address, DNS name and port. A ClusterIP service does not provide access from outside the cluster. A NodePort service provides access to a Pod from outside the cluster. And everything a ClusterIP service provides.

Make the RabbitMQ Pod available inside the cluster under the service name rabbitmq and expose 5672. Expose the RabbitMQ management UI externally on port 30672.

# rabbitmq-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq
spec:
  type: NodePort
  selector:
    name: rabbitmq-pod
  ports:
    - protocol: TCP
      port: 15672
      nodePort: 30672
      targetPort: 15672
      name: http
    - protocol: TCP
      port: 5672
      targetPort: 5672
      name: amqp

Deploy with kubectl and check the service's status:

# apply rabbitmq-service.yaml
~$ kubectl apply -f rabbitmq-service.yaml
service/rabbitmq created

# get details for rabbitmq service
~$ kubectl get service rabbitmq
NAME       TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                          AGE
rabbitmq   NodePort   10.105.37.247   <none>        15672:30672/TCP,5672:32610/TCP   1m

The RabbitMQ management UI is now available on http://localhost:30672. From within the cluster, RabbitMQ is now accessible on amqp://guest:guest@rabbitmq/5672.

screenshot

Next up

In this blog post, we built the foundations for migrating our Docker Compose Celery app to Kubernetes. We set up a self-healing RabbitMQ Deployment and a RabbitMQ service that gives us a stable URL. Now that we have a stable RabbitMQ URL, we can set up our Celery worker on Kubernetes. I will cover that in the next blog post.