How Shadowfax handles 3X traffic loads switching from celery to coroutines?

What Are Coroutines?

Coroutines are fundamental to Python's asyncio library, which provides infrastructure for writing concurrent code using the async/await syntax. This is particularly beneficial for I/O-bound operations (For example, network requests and file I/O) where waiting for external resources would otherwise block the main thread.

Why Coroutines?

Coroutines usage gave a huge boost in the performance of the number of API calls and tasks we were able to handle on a day-to-day basis. It came as a need of the moment to replace the Celery code in our communications service’s code base. The reasons for using coroutines to replace the Celery architecture in the service are as follows:

Low Latency, High Concurrency: We need to make 1000 API calls, and coroutines can do those 1000 API calls at once, concurrently. Celery, on the other hand, would have to spawn extra workers or do the same 1000 API calls in a sequential manner in limited workers or spawn 1000 workers to run them concurrently.
No Extra Infrastructure: Celery needs a message broker, a message producer, and a worker pool process. Coroutines need only Python's async def and an event loop at the programmer’s mercy to handle things effectively and efficiently.
Better for Real-Time and Stateless Workflows: If you don’t need retries or persistence of tasks (for example, real-time processing), coroutines give better control and observability.

Basics of Using Coroutines

Below is an example of a simple coroutine-based implementation in Python.

import asyncio

async def greet(name):

print(f"Hello {name} 👋")

await asyncio.sleep(1) # Simulates I/O delay

print(f"Goodbye {name} 👋")

async def main():

await asyncio.gather(

greet("Alice"),

greet("Bob"),

greet("Charlie"),

)

# Entry point

asyncio.run(main())

Output: -

Hello Alice 👋

Hello Bob 👋

Hello Charlie 👋

Goodbye Alice 👋

Goodbye Bob 👋

Goodbye Charlie 👋

What’s Happening?

greet() is a coroutine.
await asyncio.sleep(1) simulates a delay (like an API call).
asyncio.gather(...) runs all Coroutines concurrently — NOT sequentially.

How We Are Using Coroutines in Our Business Context

We had a Celery architecture for making API calls to various vendors regarding our WhatsApp, SMS, and call requests. With increasing loads and the quantum of API calls, we started off making a lot of Celery tasks for making API calls, and were incurring costs on running the Celery machines with full load as well. As mentioned earlier, to make 1000 API calls in a concurrent manner, Celery has to spawn 1000 workers for all API calls to go through together. This issue persisted with all our communication services, the traffic through our communication service has grown by 2-3X of what it was last year. With WhatsApp coming in as a major service of communication, the traffic spike has increased as well.

This demanded a solution from a business and a tech point of view to make our systems more reliable, efficient, and effective in the processing of the huge traffic loads. We then thought of using Async Kafka Consumers. Yes, you read it right! We made consumers with an infinite event loop running inside them. We were able to publish tasks into a Kafka broker, where we were able to consume them in a bulk fashion using Kafka. The event loop was able to make concurrent task processing, API calls, and then DB bulk writes and updates accordingly.

We were able to see a huge delta in our peak traffic processing capabilities. Our entire 20-pod Celery traffic was now being consumed by a handful of Kafka consumers running in an async fashion and having the computation capability of running 4-5X of the original load with no extra consumers spun up or any extra infra consumed.

Using Coroutines Efficiently

Using the asyncio library has its ups and downs. One must not feel that using async def is the straightforward solution to the entire problem. Implementing Coroutines in our service was a task of its own.

Converting Celery code into Coroutines was not as easy as it seemed. Below is a small example:

Earlier, we had a Celery task that used to take data from our API calls and trigger APIs one by one. This made us dependent on multiple workers to be spawned at once to handle traffic. The above code takes dozens of data points at once and, according to the vendor, makes all API calls concurrently. This makes the computation and processing power go higher by multiple folds.

Earlier, we used to see latencies of 100-150 ms for 1 API call and a database insert. Now, we see a total of 350-450 ms for 12-15 API calls and a db insert, and that means we can make 2-3 bulk, i.e., 35-45, requests handled in 1 second in one consumer.

Coroutines are a powerful architecture, but they come with their own caveats. Since we take in bulk data for processing to make huge numbers of concurrent API calls and processes, we see a spike in CPU requests. Our business use case pushes for this necessity, and it is rightfully adjusted as such: 1 consumer with 1-1.5 core CPU is much more beneficial than 10-15 Celery consumers with 0.5 core CPU running simultaneously.

From implementing Coroutines, our Celery traffic of 30 pods has gone down by 90-95%, and all the traffic is handled today by 6-8 consumers.

Main points to watch out for are the CPU and Memory consumption and throttling, and potential choke points. If these conditions are covered, Coroutines will provide for a very efficient use of resources for much higher traffic than traditional Celery architecture.

Our Celery machines perform with 20-30 consumers running:

Our Kafka consumers with purely async code are running with the same traffic, and 2 consumers are running:

Key Points to Remember

Django ORM Compatibility in Async Context: Django ORM is a synchronous architecture, so either handle bulk_create functions to create bulk entries or save(), but all Django calls need to be wrapped in a sync_to_async() for it to work.
Concurrency is cooperative: Async code doesn't parallelize like threads or processes. You must await correctly and avoid blocking calls.
Persistence in Task Queues: Celery persists tasks in a broker; asyncio doesn't. If your service crashes, all in-flight tasks are lost. Use Redis/DB for persistence if required.

Frequently Asked Questions About Coroutines and Celery

1. What are Coroutines used for?

Coroutines are used in Python for writing asynchronous, non-blocking code—ideal for I/O-bound tasks like network requests or file operations.

2. What is Celery in Python used for?

Celery is used to run background tasks and schedule jobs asynchronously, often in distributed systems.

3. What is Kafka used for in Python?

Kafka is used to build real-time data pipelines and streaming applications by publishing, consuming, and processing large volumes of messages efficiently.

How Shadowfax Handles 3X Traffic Loads Switching From Celery to Coroutines?