GoodJob is a multithreaded, Postgres-based, ActiveJob backend for Ruby on Rails. GoodJob has many features that take it beyond ActiveJob. One such feature is cron-like functionality that allows scheduling repeated jobs on a fixed schedule.
This post is a brief technical story of how GoodJob prevents duplicated cron jobs from running in a multi-process, distributed environment.
This is all true as of GoodJob’s current version, 3.7.4.
Briefly, how GoodJob’s cron works
GoodJob heavily leans on
Concurrent::Ruby high-level primitives, and the cron implementation is no different.
GoodJob::CronManager accepts a fixed hash of schedule configuration and feeds them into
Concurrent::ScheduledTasks, which then trigger
perform_later on the job classes at the prescribed times.
A locking strategy is necessary. GoodJob can be running across multiple processes, across numerous isolated servers or containers, in one application. GoodJob should guarantee that at the scheduled time, only a single scheduled job is enqueued.
Initially, advisory locks
When GoodJob’s cron feature was first introduced in version 1.12, Cron used an existing feature of GoodJob: Concurrency Control. Concurrency Control places limits on how many jobs can be enqueued or performed at the same time.
Concurrency Control works by assigning jobs a “key” which is simply a queryable string. Before enqueuing jobs, GoodJob will count how many job records already exist with that same key and prevent the action if the count exceeds the configured limit. GoodJob uses advisory locks to avoid race conditions during this accounting phase.
There were some downsides to using Concurrency Control for Cron.
- It was a burden on developers. Concurrency Control extends
ActiveJob::Baseand required the developer to configure Concurrency Control rules separately from the Cron configuration.
- It wasn’t very performant. Concurrency Control’s design is optimistic and works best when collisions are rare or infrequent. But a large, clock-synchronized formation of GoodJob processes is a pessimistic concurrency scenario, and it could take several seconds of advisory locking and unlocking across all the processes to insert a single job.
Then, a unique index
GoodJob v2.5.0 changed the cron locking strategy. Instead of using Concurrency Control’s advisory locks, GoodJob uses a unique compound index to prevent the same cron job from being enqueued/
INSERTed into the database multiple times.
- In addition to the existing
cron_keycolumn in job records, the change added a new timestamp column,
cron_atto store when the cron job is enqueued.
- Added a unique index on
[cron_key, cron_at]to ensure that only one job is inserted for the given key and time.
- Handled the expected
ActiveRecord::RecordNotUniquewhen multiple cron processes try to enqueue the same cron job simultaneously and the unique index prevents the
INSERTfrom taking place
Now, when a thundering herd of GoodJob processes tries to enqueue the same cron job at the same time, the database uses its unique index constraint to prevent multiple job records from being created. Great!
Does it do a good job?
Yes! I’ve received lots of positive feedback in the year+ since GoodJob’s cron moved to a unique index locking strategy. From the application perspective, there’s much less enqueueing latency using a unique index than when using advisory locks. And from the developer’s perspective, it does just work without additional configuration beyond the schedule.
Using a unique index does require preserving the job records for a bit after the jobs have been performed. Otherwise, poor clock synchronization across processes could lead to a duplicate job being inserted again if the job has already been performed and removed from the table/index. Fortunately, preserving job records should not be too burdensome because GoodJob will automatically clean them up too.
Lastly, one goal of writing this is the hope/fear that a Database Administrator will tell me this is a terrible strategy and provide a better one. Until that happens, I have confidence GoodJob’s cron is good. I’d love your feedback!