Scheduled jobs keep small teams moving: nightly syncs, hourly exports, weekly reports, and “every five minutes” event processors. They are also a classic source of quiet chaos: the same record written twice, a partial run that leaves data half-updated, or a retry that magnifies a mistake.
The antidote is idempotency. An idempotent job can run more than once for the same input without changing the outcome after the first successful run. It is the difference between “rerun it and hope” and “rerun it with confidence.”
This post focuses on scheduled and batch-style automations (cron, queue workers, GitHub Actions, serverless schedules). The goal is simple: build your job so that retries, backfills, and occasional double-triggering do not create duplicates or corrupt state.
Why idempotency matters for scheduled automations
In production, “runs exactly once” is an aspiration, not a guarantee. Jobs can overlap, workers can crash, network calls can time out, and schedulers can fire twice during deploys. Even when everything is stable, you will eventually want to backfill historical data or recompute results after a bug fix.
Idempotency turns these situations into routine operations:
- Retries are safe: a failed run can be re-run without manual cleanup.
- Backfills are safe: you can reprocess an older time range without duplicates.
- Concurrency is less scary: two runs can overlap and still converge on the same end state.
- Support is easier: when something looks off, your first move can be “re-run job for that window” instead of “write a one-off fix.”
Idempotency is not just a database trick. It is a design choice that affects how you identify work, store progress, and apply updates.
Define your unit of work and your idempotency key
Most automation bugs start with a fuzzy definition of “what the job does.” Before you think about locking or retries, define the job’s unit of work: the smallest chunk that you can process independently and safely.
Examples of units of work:
- “Sync one invoice by invoice_id.”
- “Process one webhook event by event_id.”
- “Compute one customer’s weekly summary for week_start.”
- “Import one CSV row identified by source_file + row_number.”
Choosing an idempotency key
An idempotency key is a stable identifier for the unit of work. A good key has three properties:
- Deterministic: the same work yields the same key every time.
- Unique enough: it does not collide across different work items.
- Cheap to check: you can quickly ask “have we done this already?”
Common key shapes include:
- Natural IDs:
invoice_id,ticket_id,event_id. - Composite keys:
customer_id + YYYY-MM-DDfor a daily summary. - Content-derived: a hash of a payload when no stable ID exists (use carefully and store the hash).
One helpful mindset: if you cannot explain your idempotency key in a single sentence, your unit of work is probably not well-defined yet.
Patterns that make jobs safe to re-run
There are several ways to make a job idempotent. You can mix and match, but try to keep the logic obvious so future maintainers do not accidentally remove the safety rails.
Pattern 1: Upsert, do not insert
If your job writes records that have a stable identity, prefer an upsert (create or update) over a blind insert. In database terms, this often means a unique constraint on the idempotency key and an “insert on conflict update” behavior.
Even without SQL, the principle holds: write to a destination keyed by the stable identifier so repeated writes overwrite or merge instead of duplicating.
Pattern 2: Keep a processing ledger
For jobs that trigger side effects (send an email, create a task, post a message), it is useful to keep a simple ledger table or collection of processed keys. The job checks the ledger before executing the side effect, and records the key after success.
Important detail: record enough context to debug later (timestamp, job version, relevant entity IDs), but do not turn the ledger into a second system of record.
Pattern 3: Use watermarks for time windows
Many scheduled jobs process a time range, for example “everything updated since the last run.” This can be safe if you track a watermark and process with overlap.
- Store a high-water mark you have fully processed (like an updated_at timestamp or increasing sequence).
- On each run, query starting from (watermark minus a small overlap) to re-check recent items.
- Combine this with upsert so reprocessing the overlap is harmless.
This pattern avoids missing records due to clock drift or late-arriving updates, at the cost of deliberate reprocessing.
Pattern 4: Stage, then finalize
If your job transforms a lot of data, consider a two-step write:
- Write results to a staging area keyed by the unit of work.
- Finalize by marking the unit as “done” and publishing the staged result (or swapping pointers).
This reduces partial updates when a run crashes mid-way. It also makes it easier to retry because unfinished units can be detected and resumed.
Conceptual job structure:
1) Identify work items (deterministic list)
2) For each item:
- compute idempotency_key
- if already processed: skip
- write destination via upsert
- record processed_key (or mark item complete)
3) Advance watermark only after all items succeed
- Pick a clear unit of work and a stable idempotency key before you write logic.
- Prefer upserts and unique constraints so duplicates are prevented by design.
- Use a ledger for side effects, and watermarks (with overlap) for time-window jobs.
- Advance “progress” only after you know work is complete, not when it starts.
Real-world example: an invoice sync that never duplicates
Imagine a small operations team that needs invoices from a billing system copied into their internal database so they can join it with fulfillment and support data. A scheduled job runs every hour:
- Fetch invoices updated since the last run.
- Write each invoice to the database.
- Emit a “new invoice” notification for invoices that were created (not just updated).
Without idempotency, a crash after writing 30 invoices might lead to re-running the job and inserting duplicates for those 30. Or a notification might send twice if the job repeats the same hour window.
Make it safe with three decisions:
- Unit of work: one invoice identified by
invoice_id. - Destination write: upsert invoice row keyed by
invoice_id. - Side effect ledger: store a record for notification keyed by
notify:new-invoice:{invoice_id}.
Now the hourly job can safely overlap windows. It can also intentionally re-run yesterday’s window after a bug fix. The database converges to one row per invoice, and the notification ledger ensures each invoice triggers at most one “new” notification.
A practical nuance: “new invoice” may be based on a field (like created_at or a status transition). If that logic changes later, you might want the ledger key to include a rule version, such as notify:new-invoice:v2:{invoice_id}, so you can intentionally re-notify under a new policy.
Common mistakes (and how to avoid them)
- Using “now” as part of the key. If your key contains the run timestamp, re-runs look like new work and you lose idempotency. Key the work to the entity or the window, not the run.
- Advancing the watermark too early. If you store “last_processed_time” at the start of a run, a crash can create a gap. Move it forward only after the run completes successfully.
- Assuming API pagination is stable. If you page by offset, inserts can shift pages and cause duplicates or misses. Prefer stable cursors when available, or query by updated_at with overlap and upsert.
- Skipping unique constraints. Relying on “we will not insert duplicates in code” is fragile. Let the destination enforce uniqueness where possible.
- Making the ledger write non-atomic. If you “send notification” and then “record sent,” a crash between them can resend. Reverse it (reserve then send) or use a transactional approach where feasible.
Most of these mistakes come from treating the job as a linear script instead of a small distributed system. Even “simple cron jobs” have retries, partial failures, and concurrency.
When not to use an idempotent job pattern
Idempotency is widely useful, but it is not free. A few cases where you might choose a different approach:
- True streaming requirements: If you need strict ordering and exactly-once semantics across multiple partitions, you may need a dedicated streaming platform and stronger guarantees than a cron job can offer.
- Very high throughput with tiny margins: A ledger check per item might be too expensive at extreme scale. You may prefer batch-level dedupe with periodic reconciliation, or rely on destination constraints only.
- Irreversible side effects without a safe guard: If an action cannot be repeated or undone and you cannot reliably ledger it (for example, calling a third-party endpoint with no idempotency support), consider adding a human approval step or redesigning the workflow.
Even in these situations, you can usually adopt the spirit of idempotency: deterministic inputs, explicit progress tracking, and a destination that can reject duplicates.
Copyable checklist: idempotent job readiness
Use this checklist when designing or reviewing a scheduled automation:
- Unit of work defined: I can describe it in one sentence, and it maps to one key.
- Idempotency key chosen: Deterministic, unique enough, and stable across re-runs.
- Destination prevents duplicates: Upsert semantics or unique constraint exists on the key.
- Side effects are guarded: Notifications, tickets, and emails are protected by a ledger or reservation mechanism.
- Progress tracking is safe: Watermark advances only after success; overlap is intentional.
- Retries are expected: Job can be re-run for a specific window or key without manual cleanup.
- Concurrency behavior is understood: Two runs overlapping still converge on correct state.
- Observability is sufficient: You log counts of processed, skipped, failed, and deduped items.
- Backfill plan exists: You can process historical data without special-case scripts.
If you cannot check at least six of these boxes, treat the job as risky and plan time to harden it before expanding its responsibilities.
Conclusion
Idempotent scheduled jobs are a reliability multiplier. They reduce operational anxiety, make retries and backfills routine, and prevent the subtle data drift that eats time later.
Start small: pick a clear unit of work, choose a stable key, upsert into a destination that rejects duplicates, and guard side effects with a ledger. Once those foundations are in place, your automation can be safely re-run whenever reality demands it.
FAQ
Is idempotency the same as “exactly once” processing?
Not quite. Idempotency means repeated processing produces the same end state, even if the work happens more than once. “Exactly once” is a stronger guarantee about how many times processing occurs. In practice, idempotency is often the most attainable and useful property for scheduled jobs.
What if the third-party API does not provide stable IDs?
Prefer composite keys (like email + created_at) if they are stable, or store a hash of the normalized payload as a best-effort identifier. If neither is safe, consider switching to a workflow where you ingest into a raw append-only store first, then deduplicate during downstream processing.
Should I store a ledger forever?
Not always. If your idempotency keys are tied to stable destination records (upsert with a unique key), you may not need a separate ledger for that part. For side effects, you can often expire ledger entries after a retention period that matches your operational needs, as long as re-sending after expiry is acceptable.
How do I handle job version changes or logic changes?
When a logic change must reprocess work, make it explicit. Common approaches include backfilling a date range, bumping a “rule version” in ledger keys for side effects, or adding a “computed_version” field in the destination so you can detect and recompute outdated records.