The Docker healthcheck directive instructs Docker to run a command to check if a container is still working as expected.
This helps detect cases where the container process is still running but unable to
operate as intended. For example, an unresponsive Celery worker node.
A Docker container with a healthcheck directive has its health status reported next to its normal Docker status.
$ docker compose ps
NAME ... STATUS
healthcheck-celery-worker-1 ... Up 3 minutes (healthy)
The healthcheck can also be used to trigger auto-healing actions, though this feature is only available in Docker swarm mode.
In this article, I will show you how to set up a healthcheck direcxtive for a dockerised Celery worker node. I will explain:
Monitoring a Celery worker
Celery provides several commands to inspect and manage Celery worker nodes. I frequently use the celery inspect ping
command to broadcast a ping to all registered Celery workers:
$ celery inspect ping
-> celery@ec531987a721: OK
pong
-> celery@2a2078c9ffef: OK
pong
-> celery@30a5e769daf7: OK
pong
-> celery@0ddd0198151c: OK
pong
4 nodes online.
If the ping command does not receive a response in time, it raises an error: "Error: No nodes replied within time constraint". Using the the --destination
option, you can ping a particular worker node:
$ celery inspect ping --destination celery@ec531987a721
-> celery@ec531987a721: OK
pong
1 node online.
Note that the reason a node is unresponsive might be down to broker connectivity issues or the broker being down alltogether. You might want to choose a different approach instead of celery inspect ping
if you have specific requirements.
Docker healthcheck
The Docker healthcheck directive consists of a command that Docker runs periodically to determine the container's health. It also consists of a set of parameters that specify the frequency at which the command is run.
You can specify the healthcheck in the Dockerfile so it becomes baked into the image:
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 CMD [ "executable" ]
You can also specify it in the Docker compose Yaml:
worker:
...
healthcheck:
test: ["CMD", "executable"]
interval: 30s
timeout: 30s
retries: 3
start_period: 5s
With the parameters above, a fresh container defaults to "starting". After the initial start period (5 seconds), and every interval seconds (30 seconds), Docker runs "executable".
When the check passes (zero exit status), the container becomes "healthy". After a certain number (3 retries) of consecutive failures (non-zero exit status), it becomes "unhealthy".
Docker healthcheck for Celery workers
By now you know how to ping a particular Celery worker node to determine whether it is responsive (healthy). You also know how to set up a Docker healtcheck command.
What is left to do is dockerise the Celery worker and implement the celery inspect ping
command as the Docker healthcheck directive.
I have created a Github repository. It consists of a Celery worker (worker.py
) with one registered task (do_something
) and a producer script (producer.py
) that continuously invokes do_something
via Celery.
The complete stack is defined in docker-compose.yml and includes Redis as a message broker. The Dockerfile contains the build instructions for the Celery worker node.
FROM python:3.11.1WORKDIR /appCOPY requirements.txt /app/requirements.txtRUN pip install -r requirements.txtCMD ["celery", "--app", "worker.app", "worker"]HEALTHCHECK --interval=10s --timeout=10s --start-period=3s --retries=3 CMD ["/sh", "-c", "celery inspect ping --destination celery@$HOSTNAME"]
view rawDockerfile delivered with โค by emgithub
This CMD spins up the Celery worker node inside the container. The node registers itself under the name celery@$HOSTNAME
. $HOSTNAME
refers to the container's hostname which is usually the container's Docker ID - for example f9207bc94b22
, unless you specify otherwise.
To make the HEALTHCHECK CMD
$ celery inspect ping --destination celery@$HOSTNAME
generic with respect to $HOSTNAME
, you need Docker to resolve $HOSTNAME
at runtime.
To support variable substitution, the shell format is required for the HEALTHCHECK CMD:
HEALTHCHECK --interval=10s --timeout=10s --start-period=3s --retries=3 CMD celery inspect ping --destination celery@$HOSTNAME
Note the lack of any single or double quotes around the CMD in the shell format. You can also use the exec format and execute a shell directly:
HEALTHCHECK --interval=10s --timeout=10s --start-period=3s --retries=3 CMD ["/sh", "-c", "celery inspect ping --destination celery@$HOSTNAME"]
When you spin up the container, docker will default to the "Starting" state for 3 seconds. After 3 seconds it will ping the container's Celery worker every 10 seconds. If the ping passes, the container becomes healthy. If the ping fails after 3 retries, the container becomes unhealthy.
You can also define the healthcheck in the Docker compose file. A Docker compose healthcheck property overrides any healthcheck directive defined in a Docker image:
healthcheck:
test: ["CMD-SHELL", "celery inspect ping --destination celery@$$HOSTNAME"]
interval: 30s
timeout: 10s
retries: 3
start_period: 5s
Note the $$
for correctly escaping the environment variable. To disable a healthcheck directive alltogether, use:
healthcheck:
test: ["NONE"]
Or, if you want to disable any baseimage's healthcheck directive:
HEALTHCHECK NONE
Play around with the example repository. Bring up the Git repository's Docker compose stack and check the celery-worker's health status with docker compose ps
.
Remove the Redis broker docker rm -f healthcheck-celery-broker-1
to simulate the Celery worker node becoming unresponsive. Wait for the health state to change. Bring the Redis container back up again and observe how the Celery worker's status responds.
Scale the number of workers up with docker compose up -d --scale celery-worker=2
and check the health status in the additional worker.
Finally, to inspect and debug the healthcheck's output:
$ docker inspect --format "{{json .State.Health }}" healthcheck-celery-worker-1
In a future blog post, I will expand on the healthcheck topic and explore how they can be used for auto-healing in Docker swarm mode and kubernetes.
Happy Celery coding!