Docker Health Check for Celery Workers

The Docker healthcheck directive instructs Docker to run a command to check if a container is still working as expected.

This helps detect cases where the container process is still running but unable to
operate as intended. For example, an unresponsive Celery worker node.

A Docker container with a healthcheck directive has its health status reported next to its normal Docker status.

$ docker compose ps

NAME                            ...     STATUS
healthcheck-celery-worker-1     ...     Up 3 minutes (healthy)

The healthcheck can also be used to trigger auto-healing actions, though this feature is only available in Docker swarm mode.

In this article, I will show you how to set up a healthcheck direcxtive for a dockerised Celery worker node. I will explain:

Monitoring a Celery worker

Celery provides several commands to inspect and manage Celery worker nodes. I frequently use the celery inspect ping command to broadcast a ping to all registered Celery workers:

$ celery inspect ping
->  celery@ec531987a721: OK
        pong
->  celery@2a2078c9ffef: OK
        pong
->  celery@30a5e769daf7: OK
        pong
->  celery@0ddd0198151c: OK
        pong

4 nodes online.

If the ping command does not receive a response in time, it raises an error: "Error: No nodes replied within time constraint". Using the the --destination option, you can ping a particular worker node:

$ celery inspect ping --destination celery@ec531987a721
->  celery@ec531987a721: OK
        pong

1 node online.

Note that the reason a node is unresponsive might be down to broker connectivity issues or the broker being down alltogether. You might want to choose a different approach instead of celery inspect ping if you have specific requirements.

Docker healthcheck

The Docker healthcheck directive consists of a command that Docker runs periodically to determine the container's health. It also consists of a set of parameters that specify the frequency at which the command is run.

You can specify the healthcheck in the Dockerfile so it becomes baked into the image:

HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 CMD [ "executable" ]

You can also specify it in the Docker compose Yaml:

    worker:
        ...
        healthcheck:
          test: ["CMD", "executable"]
          interval: 30s
          timeout: 30s
          retries: 3
          start_period: 5s

With the parameters above, a fresh container defaults to "starting". After the initial start period (5 seconds), and every interval seconds (30 seconds), Docker runs "executable".

When the check passes (zero exit status), the container becomes "healthy". After a certain number (3 retries) of consecutive failures (non-zero exit status), it becomes "unhealthy".

Docker healthcheck for Celery workers

By now you know how to ping a particular Celery worker node to determine whether it is responsive (healthy). You also know how to set up a Docker healtcheck command.

What is left to do is dockerise the Celery worker and implement the celery inspect ping command as the Docker healthcheck directive.

I have created a Github repository. It consists of a Celery worker (worker.py) with one registered task (do_something) and a producer script (producer.py) that continuously invokes do_something via Celery.

The complete stack is defined in docker-compose.yml and includes Redis as a message broker. The Dockerfile contains the build instructions for the Celery worker node.

FROM python:3.11.1WORKDIR /appCOPY requirements.txt /app/requirements.txtRUN pip install -r requirements.txtCMD ["celery", "--app", "worker.app", "worker"]HEALTHCHECK --interval=10s --timeout=10s --start-period=3s --retries=3 CMD ["/sh", "-c", "celery inspect ping --destination celery@$HOSTNAME"]

view rawDockerfile delivered with ❤ by emgithub

This CMD spins up the Celery worker node inside the container. The node registers itself under the name celery@$HOSTNAME. $HOSTNAME refers to the container's hostname which is usually the container's Docker ID - for example f9207bc94b22, unless you specify otherwise.

To make the HEALTHCHECK CMD

$ celery inspect ping --destination celery@$HOSTNAME

generic with respect to $HOSTNAME, you need Docker to resolve $HOSTNAME at runtime.

To support variable substitution, the shell format is required for the HEALTHCHECK CMD:

HEALTHCHECK --interval=10s --timeout=10s --start-period=3s --retries=3 CMD celery inspect ping --destination celery@$HOSTNAME

Note the lack of any single or double quotes around the CMD in the shell format. You can also use the exec format and execute a shell directly:

HEALTHCHECK --interval=10s --timeout=10s --start-period=3s --retries=3 CMD ["/sh", "-c", "celery inspect ping --destination celery@$HOSTNAME"]

When you spin up the container, docker will default to the "Starting" state for 3 seconds. After 3 seconds it will ping the container's Celery worker every 10 seconds. If the ping passes, the container becomes healthy. If the ping fails after 3 retries, the container becomes unhealthy.

You can also define the healthcheck in the Docker compose file. A Docker compose healthcheck property overrides any healthcheck directive defined in a Docker image:

healthcheck:
    test: ["CMD-SHELL", "celery inspect ping --destination celery@$$HOSTNAME"]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 5s

Note the $$ for correctly escaping the environment variable. To disable a healthcheck directive alltogether, use:

healthcheck:
    test: ["NONE"]

Or, if you want to disable any baseimage's healthcheck directive:

HEALTHCHECK NONE

Play around with the example repository. Bring up the Git repository's Docker compose stack and check the celery-worker's health status with docker compose ps.

Remove the Redis broker docker rm -f healthcheck-celery-broker-1 to simulate the Celery worker node becoming unresponsive. Wait for the health state to change. Bring the Redis container back up again and observe how the Celery worker's status responds.

Scale the number of workers up with docker compose up -d --scale celery-worker=2 and check the health status in the additional worker.

Finally, to inspect and debug the healthcheck's output:

$ docker inspect --format "{{json .State.Health }}" healthcheck-celery-worker-1

In a future blog post, I will expand on the healthcheck topic and explore how they can be used for auto-healing in Docker swarm mode and kubernetes.

Happy Celery coding!

Docker Health Check for Celery Workers

Monitoring a Celery worker

Docker healthcheck

Docker healthcheck for Celery workers

More from this blog

Happy User, Happy Life: Real-Time Celery Progress Bars With FastAPI and HTMX

Celery Tasks: A Guide to SQLAlchemy Session Handling

Celery on Windows: What's the latest?

A Quick Guide to Celery Task Routing

Filesystem As Broker: A Quick Guide

Command Palette

Monitoring a Celery worker

Docker healthcheck

Docker healthcheck for Celery workers

More from this blog