SKIP LOCKED Is Not Enough

PostgreSQL's SKIP LOCKED is the go-to pattern for building task queues. Lock a row, skip it if someone else already has it, move on. It works. Most of the time.

But the standard two-step approach — SELECT ... FOR UPDATE SKIP LOCKED, then UPDATE — has a race window. Under high concurrency, two workers can observe the same rows before either one marks them as claimed. You end up with duplicate task execution, and no amount of retries fixes a fundamentally broken acquisition pattern.

The Standard SKIP LOCKED Pattern

The textbook approach looks like this:

-- Step 1: Claim tasks
SELECT id, payload
FROM tasks
WHERE status = 'pending'
  AND scheduled_at <= NOW()
ORDER BY priority DESC, scheduled_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED;
 
-- Step 2: Mark them as claimed
UPDATE tasks
SET status = 'running', claimed_by = 'worker-1', claimed_at = NOW()
WHERE id IN (/* ids from step 1 */);

This is what most tutorials show. It's clean, readable, and works perfectly at low concurrency. One worker, one queue — no problem.

The FOR UPDATE SKIP LOCKED clause does two things: it acquires a row-level lock on the selected rows, and it skips any rows that are already locked by another transaction. This means two concurrent SELECT statements will never return the same rows. So far, so good.

The Race Window

The problem appears when steps 1 and 2 are separate statements. Here's the timeline:

Worker A: BEGIN
Worker A: SELECT ... FOR UPDATE SKIP LOCKED  → gets rows [1, 2, 3]
                                              (rows are locked)

Worker B: BEGIN
Worker B: SELECT ... FOR UPDATE SKIP LOCKED  → skips [1, 2, 3], gets [4, 5, 6]

Worker A: UPDATE ... SET status = 'running'  → marks [1, 2, 3]
Worker A: COMMIT                              → locks released

Worker C: BEGIN
Worker C: SELECT ... FOR UPDATE SKIP LOCKED  → gets [7, 8, 9]  ✓ fine

This works. But consider what happens when the application layer introduces latency between the SELECT and UPDATE:

Worker A: BEGIN
Worker A: SELECT ... FOR UPDATE SKIP LOCKED  → gets rows [1, 2, 3]

-- Application does validation, logging, metric recording...
-- 50ms pass

Worker A: UPDATE ... SET status = 'running'  → marks [1, 2, 3]
Worker A: COMMIT

-- Meanwhile, Worker A's application layer starts processing row 1
-- But Worker A's transaction already committed
-- The row lock is released

Worker B: BEGIN
Worker B: SELECT ... FOR UPDATE SKIP LOCKED  → row 1 is 'running', filtered ✓

In this case the WHERE status = 'pending' filter saves you — Worker B won't pick up row 1 because its status is already 'running'. But this relies on the UPDATE having happened and committed before Worker B queries. If Worker A's application crashes between SELECT and UPDATE — or if the UPDATE fails — the rows remain 'pending' with no lock. They'll be picked up again, which is fine for retry semantics but dangerous if your tasks have side effects.

The real risk is more subtle: if your application does anything between claiming tasks and starting execution that could fail, you have a window where tasks are locked but not marked, or marked but not yet processed. This is the gap that leads to duplicate execution, dropped tasks, or both.

Atomic CTE: One Roundtrip, No Race Window

The fix is to combine SELECT and UPDATE into a single atomic operation using a Common Table Expression (CTE):

-- Atomic claim: SELECT + UPDATE in one statement
WITH claimable AS (
  SELECT id
  FROM tasks
  WHERE status = 'pending'
    AND scheduled_at <= NOW()
  ORDER BY priority DESC, scheduled_at ASC
  LIMIT 10
  FOR UPDATE SKIP LOCKED
)
UPDATE tasks
SET status = 'running',
    claimed_by = 'worker-1',
    claimed_at = NOW()
FROM claimable
WHERE tasks.id = claimable.id
RETURNING tasks.id, tasks.payload, tasks.metadata;

This is a single SQL statement. PostgreSQL executes the CTE and the UPDATE atomically — there is no window between selecting and claiming. The rows are locked, updated, and returned in one roundtrip. No application code runs between "I found these tasks" and "I own these tasks."

The RETURNING clause gives you the claimed task data without a second query. You get back exactly the rows you claimed, already marked as 'running'.

How Konduit Implements This

In Konduit, the task acquisition layer uses this atomic CTE pattern. Here's the Kotlin implementation:

// Simplified from Konduit's TaskRepository
fun claimTasks(workerId: String, batchSize: Int): List<Task> {
    return jdbcTemplate.query("""
        WITH claimable AS (
            SELECT id
            FROM tasks
            WHERE status = 'PENDING'
              AND scheduled_at <= NOW()
            ORDER BY priority DESC, scheduled_at ASC
            LIMIT ?
            FOR UPDATE SKIP LOCKED
        )
        UPDATE tasks
        SET status = 'RUNNING',
            claimed_by = ?,
            claimed_at = NOW()
        FROM claimable
        WHERE tasks.id = claimable.id
        RETURNING tasks.*
    """.trimIndent(), taskRowMapper, batchSize, workerId)
}

The caller gets a list of tasks that are already claimed. No intermediate state. No retry logic for the claim itself. If two workers call claimTasks simultaneously, they each get a disjoint set of tasks — guaranteed by PostgreSQL's row-level locking.

Benchmarking: Zero Duplicates Under Load

Konduit's test suite verifies this with Testcontainers — real PostgreSQL, real concurrency, real contention:

@Test
fun `concurrent workers never claim the same task`() {
    // Seed 100 pending tasks
    repeat(100) { i -> insertTask(id = i, status = "PENDING") }
 
    // Launch 3 workers claiming tasks concurrently
    val claimed = ConcurrentHashMap.newKeySet<Int>()
    val duplicates = AtomicInteger(0)
 
    runBlocking {
        (1..3).map { workerId ->
            async(Dispatchers.IO) {
                while (true) {
                    val batch = repo.claimTasks("worker-$workerId", batchSize = 5)
                    if (batch.isEmpty()) break
                    batch.forEach { task ->
                        if (!claimed.add(task.id)) {
                            duplicates.incrementAndGet() // Should never happen
                        }
                    }
                }
            }
        }.awaitAll()
    }
 
    assertEquals(100, claimed.size)     // All tasks claimed
    assertEquals(0, duplicates.get())   // Zero duplicates
}

This test runs 3 workers pulling from the same queue of 100 tasks in batches of 5. Every task is claimed exactly once. Replace the atomic CTE with a two-step SELECT/UPDATE, and duplicates appear under load — not always, but often enough to be a production incident.

When the Simple Pattern Is Enough

The two-step approach is fine when:

You have a single worker (no concurrency)
Tasks are idempotent (duplicate execution is harmless)
The SELECT and UPDATE are in the same transaction with nothing between them
You're processing low enough volume that contention is rare

The atomic CTE is worth the slight added complexity when:

Multiple workers consume from the same queue
Tasks have side effects (sending emails, charging payments, triggering webhooks)
You need guarantees under high throughput (100+ tasks/sec)
You're building infrastructure that other teams depend on

If your task queue is a critical path — not a best-effort background job — use the atomic pattern. The cost is one slightly more complex SQL statement. The benefit is eliminating an entire class of concurrency bugs.

The Rule

If you're building a PostgreSQL-backed task queue with concurrent workers:

Always combine claim and update into a single atomic CTE
Never rely on application code between SELECT and UPDATE for correctness
Always test under real concurrency with real PostgreSQL — mocks won't reveal race conditions
Always use RETURNING to avoid a second query for the claimed data

PostgreSQL gives you the primitives to build correct concurrent systems. SKIP LOCKED is one of those primitives. But it's only safe when you use it atomically — one statement, one roundtrip, zero race window.

Konduit implements this pattern as part of its distributed workflow orchestration engine — with fan-in coordination, virtual threads, and a test suite of 184 tests running against real PostgreSQL via Testcontainers. See the project page or the source on GitHub.