PostgreSQL's SKIP LOCKED is the go-to pattern for building task queues. Lock a row, skip it if someone else already has it, move on. It works. Most of the time.
But the standard two-step approach — SELECT ... FOR UPDATE SKIP LOCKED, then UPDATE — has a race window. Under high concurrency, two workers can observe the same rows before either one marks them as claimed. You end up with duplicate task execution, and no amount of retries fixes a fundamentally broken acquisition pattern.
The Standard SKIP LOCKED Pattern
The textbook approach looks like this:
-- Step 1: Claim tasks
SELECT id, payload
FROM tasks
WHERE status = 'pending'
AND scheduled_at <= NOW()
ORDER BY priority DESC, scheduled_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- Step 2: Mark them as claimed
UPDATE tasks
SET status = 'running', claimed_by = 'worker-1', claimed_at = NOW()
WHERE id IN (/* ids from step 1 */);This is what most tutorials show. It's clean, readable, and works perfectly at low concurrency. One worker, one queue — no problem.
The FOR UPDATE SKIP LOCKED clause does two things: it acquires a row-level lock on the selected rows, and it skips any rows that are already locked by another transaction. This means two concurrent SELECT statements will never return the same rows. So far, so good.
The Race Window
The problem appears when steps 1 and 2 are separate statements. Here's the timeline:
Worker A: BEGIN
Worker A: SELECT ... FOR UPDATE SKIP LOCKED → gets rows [1, 2, 3]
(rows are locked)
Worker B: BEGIN
Worker B: SELECT ... FOR UPDATE SKIP LOCKED → skips [1, 2, 3], gets [4, 5, 6]
Worker A: UPDATE ... SET status = 'running' → marks [1, 2, 3]
Worker A: COMMIT → locks released
Worker C: BEGIN
Worker C: SELECT ... FOR UPDATE SKIP LOCKED → gets [7, 8, 9] ✓ fine
This works. But consider what happens when the application layer introduces latency between the SELECT and UPDATE:
Worker A: BEGIN
Worker A: SELECT ... FOR UPDATE SKIP LOCKED → gets rows [1, 2, 3]
-- Application does validation, logging, metric recording...
-- 50ms pass
Worker A: UPDATE ... SET status = 'running' → marks [1, 2, 3]
Worker A: COMMIT
-- Meanwhile, Worker A's application layer starts processing row 1
-- But Worker A's transaction already committed
-- The row lock is released
Worker B: BEGIN
Worker B: SELECT ... FOR UPDATE SKIP LOCKED → row 1 is 'running', filtered ✓
In this case the WHERE status = 'pending' filter saves you — Worker B won't pick up row 1 because its status is already 'running'. But this relies on the UPDATE having happened and committed before Worker B queries. If Worker A's application crashes between SELECT and UPDATE — or if the UPDATE fails — the rows remain 'pending' with no lock. They'll be picked up again, which is fine for retry semantics but dangerous if your tasks have side effects.
The real risk is more subtle: if your application does anything between claiming tasks and starting execution that could fail, you have a window where tasks are locked but not marked, or marked but not yet processed. This is the gap that leads to duplicate execution, dropped tasks, or both.
Atomic CTE: One Roundtrip, No Race Window
The fix is to combine SELECT and UPDATE into a single atomic operation using a Common Table Expression (CTE):
-- Atomic claim: SELECT + UPDATE in one statement
WITH claimable AS (
SELECT id
FROM tasks
WHERE status = 'pending'
AND scheduled_at <= NOW()
ORDER BY priority DESC, scheduled_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED
)
UPDATE tasks
SET status = 'running',
claimed_by = 'worker-1',
claimed_at = NOW()
FROM claimable
WHERE tasks.id = claimable.id
RETURNING tasks.id, tasks.payload, tasks.metadata;This is a single SQL statement. PostgreSQL executes the CTE and the UPDATE atomically — there is no window between selecting and claiming. The rows are locked, updated, and returned in one roundtrip. No application code runs between "I found these tasks" and "I own these tasks."
The RETURNING clause gives you the claimed task data without a second query. You get back exactly the rows you claimed, already marked as 'running'.
How Konduit Implements This
In Konduit, the task acquisition layer uses this atomic CTE pattern. Here's the Kotlin implementation:
// Simplified from Konduit's TaskRepository
fun claimTasks(workerId: String, batchSize: Int): List<Task> {
return jdbcTemplate.query("""
WITH claimable AS (
SELECT id
FROM tasks
WHERE status = 'PENDING'
AND scheduled_at <= NOW()
ORDER BY priority DESC, scheduled_at ASC
LIMIT ?
FOR UPDATE SKIP LOCKED
)
UPDATE tasks
SET status = 'RUNNING',
claimed_by = ?,
claimed_at = NOW()
FROM claimable
WHERE tasks.id = claimable.id
RETURNING tasks.*
""".trimIndent(), taskRowMapper, batchSize, workerId)
}The caller gets a list of tasks that are already claimed. No intermediate state. No retry logic for the claim itself. If two workers call claimTasks simultaneously, they each get a disjoint set of tasks — guaranteed by PostgreSQL's row-level locking.
Benchmarking: Zero Duplicates Under Load
Konduit's test suite verifies this with Testcontainers — real PostgreSQL, real concurrency, real contention:
@Test
fun `concurrent workers never claim the same task`() {
// Seed 100 pending tasks
repeat(100) { i -> insertTask(id = i, status = "PENDING") }
// Launch 3 workers claiming tasks concurrently
val claimed = ConcurrentHashMap.newKeySet<Int>()
val duplicates = AtomicInteger(0)
runBlocking {
(1..3).map { workerId ->
async(Dispatchers.IO) {
while (true) {
val batch = repo.claimTasks("worker-$workerId", batchSize = 5)
if (batch.isEmpty()) break
batch.forEach { task ->
if (!claimed.add(task.id)) {
duplicates.incrementAndGet() // Should never happen
}
}
}
}
}.awaitAll()
}
assertEquals(100, claimed.size) // All tasks claimed
assertEquals(0, duplicates.get()) // Zero duplicates
}This test runs 3 workers pulling from the same queue of 100 tasks in batches of 5. Every task is claimed exactly once. Replace the atomic CTE with a two-step SELECT/UPDATE, and duplicates appear under load — not always, but often enough to be a production incident.
When the Simple Pattern Is Enough
The two-step approach is fine when:
- You have a single worker (no concurrency)
- Tasks are idempotent (duplicate execution is harmless)
- The SELECT and UPDATE are in the same transaction with nothing between them
- You're processing low enough volume that contention is rare
The atomic CTE is worth the slight added complexity when:
- Multiple workers consume from the same queue
- Tasks have side effects (sending emails, charging payments, triggering webhooks)
- You need guarantees under high throughput (100+ tasks/sec)
- You're building infrastructure that other teams depend on
If your task queue is a critical path — not a best-effort background job — use the atomic pattern. The cost is one slightly more complex SQL statement. The benefit is eliminating an entire class of concurrency bugs.
The Rule
If you're building a PostgreSQL-backed task queue with concurrent workers:
- Always combine claim and update into a single atomic CTE
- Never rely on application code between SELECT and UPDATE for correctness
- Always test under real concurrency with real PostgreSQL — mocks won't reveal race conditions
- Always use
RETURNINGto avoid a second query for the claimed data
PostgreSQL gives you the primitives to build correct concurrent systems. SKIP LOCKED is one of those primitives. But it's only safe when you use it atomically — one statement, one roundtrip, zero race window.
Konduit implements this pattern as part of its distributed workflow orchestration engine — with fan-in coordination, virtual threads, and a test suite of 184 tests running against real PostgreSQL via Testcontainers. See the project page or the source on GitHub.