Reactive Task Dispatch: Why Celery Beat Is Not Enough
The problem with periodic scheduling
Celery beat is a scheduler: it fires tasks on a fixed interval. For administrative work — health checks, metrics rollups, log rotation — that is exactly the right tool. The task either runs or it doesn't; missing one cycle is harmless.
Dispatching work to idle workers is a different story.
If a worker finishes a job at 10:00:01 and beat is configured to check for new work every 30 seconds, the worker sits idle until 10:00:30. Multiply that across dozens of workers and queues and you introduce up to 30 seconds of artificial latency on every job.
More critically, beat is fundamentally time-driven, not state-driven. It reacts to the passage of time, not to changes in your system.
There is also a resilience concern. In many deployments, beat relies on coordination via systems like Redis. If that coordination layer fails:
- Periodic tasks stop firing
- Workers drain their current work and go idle
- New jobs accumulate but are not dispatched
For low-latency, state-driven pipelines, this is the wrong failure mode.
The reactive alternative: database signals as event triggers
Every meaningful state transition in a pipeline is already represented as a database write:
- A new job arrives → a queue record is inserted
- A worker picks up a job → an in-progress record is inserted
- A worker finishes a job → an in-progress record is deleted
Most frameworks (e.g. Django, SQLAlchemy) expose lifecycle hooks around these changes.
The key idea:
Instead of polling for work, trigger dispatch when state changes.
# pseudocode
on job_completed(job):
enqueue(dispatch_next_available, countdown=1s, expires=15s)
on job_created(job):
enqueue(dispatch_next_available, countdown=1s, expires=15s)
Every time a job finishes or a new one arrives, a dispatch attempt is triggered immediately. Idle capacity is filled as soon as it appears — not when a clock ticks.
Correctness: triggering at the right moment
There is an important subtlety here.
Many ORM hooks fire inside the database transaction, before it is committed. If you enqueue a dispatch task at that point, you risk triggering work for a state change that is later rolled back.
To avoid this, dispatch triggers should be emitted only after commit:
- Use post-commit hooks (e.g.
transaction.on_commitin Django) - Or adopt an outbox pattern for stricter guarantees
Without this, the system can exhibit "phantom dispatches" that are hard to debug.
What this actually is: an event-driven scheduler
This approach does not eliminate scheduling — it changes how scheduling works.
Instead of a time-driven scheduler (beat), you now have an event-driven scheduler:
dispatch_next_availablescans for work- assigns it to workers
- and enforces concurrency/ordering rules
The difference is not whether scheduling exists, but what triggers it:
- Beat → time-based triggers
- Signals → state-change triggers
For reactive systems, the latter is usually a better fit.
Why this improves resilience
With this model, dispatch depends on:
- the database (source of truth)
- the message broker (e.g. RabbitMQ)
It does not depend on a central scheduler loop.
As long as state changes are written to the database and messages can be enqueued, the system continues to make progress — even if auxiliary systems like Redis-backed schedulers are unavailable.
This shifts failure modes from "global pause" to "localized degradation," which is generally easier to tolerate.
Idempotency and deduplication
Event-driven systems introduce fan-out:
Ten jobs complete → ten dispatch triggers fire.
This is manageable with standard techniques:
- Task expiration (
expires) — stale dispatch attempts are dropped by the broker. - Distributed locking — only one instance of the dispatch logic runs at a time.
- Idempotent dispatch logic — running dispatch multiple times produces the same result.
Optionally, you can add debouncing / coalescing (collapse bursts into a single trigger) and single-flight guarantees (only one in-flight dispatch attempt cluster-wide). These patterns prevent the "thundering herd" problem without sacrificing responsiveness.
Tradeoffs
This approach is not free. It introduces different constraints:
- Stronger coupling to the database layer
- More write-triggered activity
- Transactional complexity
- Operational discipline required (idempotency, locking, observability)
In exchange, you get significantly lower latency and better alignment with how the system actually behaves.
Combining both approaches
The correct architecture uses both mechanisms, with clear separation of roles:
- Model lifecycle hooks / post-commit events — immediate, state-driven actions: dispatch, scaling signals, cooldown triggers.
- Celery beat — periodic, low-priority work: cleanup, metrics, health checks.
The key rule:
Do not use a clock to react to events. If something needs to happen because the system changed, trigger it from the place where that change occurs.
Summary
Celery beat is a clock. Model signals are interrupts.
Use the clock for things that should happen regularly.
Use interrupts for things that should happen now.
For pipelines where latency and throughput matter, moving from polling to event-driven dispatch is one of the highest-leverage architectural shifts you can make.