The Solo Worker Pool

Have you ever used the Celery solo execution pool? Don't worry if not, you are in good company. The solo pool is probably the most overlooked and underrated of all Celery built-in execution pools. I can only speculate as to why it is overlooked, so I focus on why I think it is underrated. The solo pool is - and I am quoting the Celery docs here:

blocking
inline
fast

The keyword here is inline. Remember how the worker and the pool are two separate concerns? While it does not mean that the worker and pool have to run in separate processes or threads, most pool implementations do precisely that to support the concurrent execution of tasks within the pool.

Inline means that the worker directly invokes the task code. The task code runs in the same process and thread as the worker itself. No forking, threading or event-looping. The worker simply invokes the task, just like any other Python function call.

The separation of concerns design pattern is still adhered to. That function call still goes through the different layers and interfaces that allow the worker to be agnostic about the actual pool type.

Everything runs in the same process, in the same thread. No process or thread management is required, no overhead between receiving the task and running it. This makes the solo pool as fast as it gets.

In-task multiprocessing

Another benefit of the solo pool's simplicity is that there are no restrictions when it comes to the use of multiprocessing or multithreading within a task. For example, if you try to do multiprocessing within a Celery task, using the default prefork pool:

# worker.py
import os
from concurrent.futures import ProcessPoolExecutor
from celery import Celery

app = Celery(broker_url=os.environ["CELERY_BROKER_URL"])

def square(num):
    return num * num


@app.task(name="task1")
def task1():
    numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

    with ProcessPoolExecutor(max_workers=4) as pool:
        results = [pool.submit(square, i) for i in numbers]
        squared_numbers = [result.get() for result in results]
        print(squared_numbers)

And run the worker using the prefork pool:

$ celery --app=worker.app worker --concurrency=1 --pool=prefork

You end up with a stack trace like this:

AssertionError: daemonic processes are not allowed to have children
[2023-07-25 17:33:51,965: ERROR/ForkPoolWorker-1]
Task task1[99acf138-c79f-4fb3-874c-cad56d725677]raised unexpected:
AssertionError('daemonic processes are not allowed to have children')
...
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 118, in start
assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children

The issue is that the prefork pool itself is a forked process, even if you run it with a concurrency setting of one. And so that all child processes are terminated gracefully, daemonic processes cannot have children. I will go into more details in the next blog post of this series which is dedicated to the prefork pool.

This limitation does not exist for the solo pool as it runs in the same process as the worker itself. Changing the --pool argument to solo makes the task execute without issues, using multiprocessing within it.

$ celery --app=worker.app worker --pool=solo

Scaling

There is a downside to all this simplicity: no concurrency. The solo pool can only process one task at a time and the worker is blocked while the solo pool executes a task.

This requires a different scaling strategy to give your Celery cluster the ability multiple tasks concurrently. For example, instead of running a single worker with a prefork pool and a pool size (concurrency) of ten, run ten workers with a solo pool each. In both cases, you end you with a cluster concurrency of ten.

I hope you have enjoyed learning more about the solo pool and where it shines. If you have any questions, comment below 👇 or drop me an email bjoern.stiel@celery.school.