Many automations fail for boring reasons: a webhook arrives twice, an API times out, a token expires, or two systems disagree about what “created” means. The failure is rarely dramatic, but it is expensive because it creates manual cleanup work and a growing fear of touching the workflow.
A webhook-first approach helps you design automations that behave like a small, dependable product instead of a fragile one-off script. The idea is to treat incoming events as the source of truth, store them safely, and process them in a controlled way with clear retries and a paper trail.
This post explains the pattern in plain terms. It is platform-agnostic: you can implement it with a serverless function, a lightweight worker, or an automation tool as long as you keep the same core guarantees.
What “webhook-first” means
Webhook-first means you start with events, not schedules. Instead of polling an API every 5 minutes to see whether something changed, you accept a push notification (a webhook) from the system that knows the change happened.
The trap is assuming “receive webhook” equals “do the work immediately.” In reliable systems, receiving an event is only step one. Step two is recording the event and acknowledging receipt quickly. Step three is processing it asynchronously with retries, idempotency, and clear outcomes.
The big benefits:
- Speed: changes propagate quickly without frequent polling.
- Lower cost: fewer wasted API calls and less background infrastructure.
- Better correctness: you can build in deduplication and a consistent audit trail.
The core building blocks
A webhook-first automation is usually four small components. You can combine them into one service, but separating the responsibilities makes reliability easier to reason about.
1) Ingest, store, acknowledge
Your webhook handler should do as little as possible:
- Verify the request (shared secret, signature, or token).
- Normalize and store the payload (or a reference to it) as an immutable “event record.”
- Return a fast success response to the sender.
Fast acknowledgement matters because many providers will retry if you are slow, and slow responses can create duplicated work or bursty traffic.
2) Queue plus worker
After storing the event, you place a lightweight job onto a queue. A worker then processes that job with controlled concurrency and retries. If you do not have a queue, you can approximate it with a “pending events” table and a periodic worker that claims work, but the conceptual model stays the same: do not process directly inside the webhook request.
3) Idempotency and deduplication
Assume every event can arrive multiple times. Your processing must be safe to run twice without creating duplicate CRM records, double-charging, or sending the same email repeatedly.
Practical ways to do that:
- Event ID dedup: store a unique event identifier and mark it processed once complete.
- Idempotency keys for outgoing calls: when calling downstream APIs, use a stable key (for example:
sourceSystem + objectId + action) so repeated attempts do not create duplicates. - Upsert instead of create: prefer “create or update” semantics when available.
4) Retries, backoff, and a dead-letter path
Retries are not optional; the internet is unreliable. But retries need shape, otherwise you create thundering herds and inconsistent state.
- Retry only what is retryable: timeouts, 429 rate limits, and transient 5xx errors.
- Backoff: wait longer between attempts (for example, 30s, 2m, 10m).
- Dead-letter: after N failed attempts, stop retrying and mark the event “needs review,” with enough context for a human to resolve it.
- Separate “receive webhook” from “process event” so you can retry safely.
- Store every event as an immutable record, then build idempotent processing on top.
- Use explicit outcomes: processed, ignored, retrying, dead-lettered.
- Design for duplicates and partial failures from the beginning.
A concrete example: lead intake to CRM and email
Imagine a small service business with a website contact form, a CRM, and an email system. The goal is: when someone submits the form, create or update a CRM contact, create a deal, and send a confirmation email.
A webhook-first design might look like this:
Event: website.form_submitted
Store: event_id, received_at, payload_hash, payload_json
Process:
1) Upsert CRM contact by email (idempotency key: "contact:{email}")
2) Upsert deal by contact + form_id (key: "deal:{email}:{form_id}")
3) Send confirmation email once (key: "email:{event_id}")
Outcome: processed | retrying | dead_letter
Notice the keys: the workflow is intentionally safe to rerun. If the CRM call succeeds but the email fails, you retry without creating another contact or deal. If the website re-sends the same webhook (common), the event ID and payload hash prevent duplicate processing.
Handling real edge cases
This is where reliability is won or lost. A few common situations and how the pattern helps:
- Webhook arrives twice: your event record shows it already processed, so the second run becomes a no-op.
- CRM is down temporarily: the worker retries with backoff; no manual intervention required.
- Email provider returns 429: retry later, but keep the CRM updates, since those already completed.
- Bad payload: mark dead-letter with a clear reason (missing email, invalid JSON, unexpected schema). A human can fix the source form configuration rather than guessing.
The result is not just fewer failures. It is fewer “mystery failures,” because each event has a visible lifecycle.
A reliability checklist you can copy
Use this as a pre-flight check before you call an automation “done.” Even small workflows benefit from basic operational discipline.
- Webhook security: verify signatures or secrets; reject unsigned requests.
- Fast acknowledge: return success quickly after storing the event record.
- Immutable event log: store raw payloads (or references) and timestamps.
- Deduplication: define what “same event” means (event ID, payload hash, or both).
- Idempotent processing: ensure each step can run multiple times safely.
- State tracking: processed, retrying, dead-letter; record attempt counts and last error.
- Retry policy: which errors retry, backoff schedule, and max attempts.
- Dead-letter review: a human-friendly way to inspect and requeue after fixing issues.
- Audit trail: log downstream IDs (CRM contact ID, deal ID, email message ID) into the event record.
- Change safety: add a version field to your event normalization so you can evolve payload handling.
If you have to pick only three items: event log, idempotency, and a dead-letter path. Those turn “random breakages” into “known queue items.”
Common mistakes (and how to avoid them)
Most webhook automations break in repeatable ways. Here are the patterns that show up again and again in small teams.
- Doing everything inside the webhook request. If the handler calls three APIs and one times out, you will get retries and duplicates. Fix: store and enqueue, then process out-of-band.
- No idempotency strategy. People notice this only after they see duplicated CRM entries. Fix: define stable keys per step, not just per workflow.
- Retrying non-retryable errors. If the payload is invalid, retries waste time. Fix: classify errors into retryable and terminal, and record the reason.
- Silent failures. If dead-letter events are not visible, they become invisible backlog. Fix: a simple dashboard or periodic review list (even an internal page) is enough.
- Over-trusting “success” responses. Some APIs respond 200 but still fail later, or accept work asynchronously. Fix: where possible, store downstream identifiers and confirm state with a follow-up read for critical steps.
When not to use this pattern
Webhook-first is powerful, but it is not always the simplest choice. Avoid it when:
- You cannot receive inbound traffic. If your environment cannot accept webhooks at all, a polling job may be the only option.
- The source system has unreliable webhooks. Some tools miss events or provide incomplete payloads. In that case, you might need a hybrid: webhook as a hint, plus periodic reconciliation polling.
- The workflow is purely internal and batch-based. For nightly exports, a schedule is straightforward and easier to reason about.
- Ultra-low latency is required. If you need sub-second actions and you cannot tolerate queue delay, you may still use webhooks, but you will design a more direct path with careful safeguards.
A good rule: if missing an event causes serious harm, plan for reconciliation. Webhooks are great signals, but “trust and verify” is how systems stay correct over time.
Conclusion
Reliable automations are less about cleverness and more about structure. A webhook-first pattern gives you that structure by separating receipt from processing, recording immutable events, and building idempotent steps with explicit retry and dead-letter behavior.
If you adopt only one habit, make it this: treat every webhook as an event you can replay safely. That mindset turns messy integrations into workflows you can maintain confidently.
FAQ
Do I need a message queue to do webhook-first?
No. A queue helps, but the essential idea is decoupling: store the event, then process it separately. A database table plus a worker that claims “pending” rows can work for smaller workloads.
How do I choose an idempotency key?
Pick a key that represents the real-world action, not the attempt. For example, “create deal for this form submission” is stable, while “attempt #3” is not. If the source provides an event ID, use it as part of the key.
What should I store in an event record?
At minimum: event ID (if provided), received timestamp, raw payload (or a reference), processing status, attempt count, last error, and any downstream IDs created. That is enough to debug and reprocess responsibly.
How many retries are reasonable?
Enough to cover common transient failures, but not so many that you delay human review. Many teams start with 5 to 8 attempts over a few hours with backoff, then dead-letter with a clear reason.