Most small automations start as a quick script: pull data from a spreadsheet, call an API, post a message, repeat. The first version works, and then reality shows up. A new channel needs notifications. A different team wants a different threshold. Someone asks for a quiet period on weekends.
If every change requires editing code, redeploying, and re-testing the full flow, your automation turns into a fragile mini product. Configuration-first automation is a practical way to keep that complexity under control: put the frequently changing decisions into a human-editable config file, while keeping the execution logic stable.
This post focuses on YAML as the configuration format because it is readable and easy to review in version control. The principles apply equally to JSON or a database table.
Why configuration-first beats hardcoding
Hardcoded values are convenient in the moment and expensive over time. The more your script grows, the more “tiny tweaks” become risky because they require code changes, code review, and a deploy. Configuration-first reduces that churn by separating policy from mechanism.
- Policy: What should happen and under what conditions (thresholds, routing, schedules, templates).
- Mechanism: How to do it (fetch records, transform data, call APIs, handle retries, log outcomes).
This separation gives you three major benefits: faster iteration (edit config instead of code), safer changes (review diff of values and rules), and clearer ownership (non-engineers can propose changes while engineers protect the core logic).
Key Takeaways
- Put business decisions in config and keep execution in code.
- Design config for reviewability: defaults, limits, and clear naming.
- Validate config before any real work happens, and fail closed on unsafe values.
- Use a “dry run” mode so config changes can be tested without side effects.
What belongs in config vs code
A useful rule: if a value changes because the business changes, it belongs in config. If it changes because technology changes, it belongs in code. This is not perfect, but it is a good starting point.
Good candidates for configuration
- Destinations: which Slack channel, which email list, which queue.
- Thresholds: “notify if over 10”, “flag if error rate exceeds 2%”.
- Time windows: business hours, quiet hours, weekend behavior.
- Routing rules: team assignment based on region, product, account tier.
- Message templates: subject lines, short text blocks, labeling.
- Feature toggles: enable or disable a module without deploying new code.
Good candidates for code
- Authentication and secret handling (tokens should not live in YAML).
- API client behavior: retries, timeouts, pagination.
- Data transformations: normalization, mapping, deduplication logic.
- Safety constraints: max messages per run, allowed domains, schema validation.
- Observability: logging shape, metrics emission, correlation IDs.
When in doubt, bias toward keeping complex logic in code. Configuration should express choices, not implement mini-programs.
Designing a YAML config people can live with
Good configuration has three properties: it is hard to misunderstand, hard to misuse, and easy to review. The easiest way to get there is to keep a stable structure, use explicit defaults, and avoid cleverness.
Here is a compact, conceptual example of a YAML shape that scales past the first few tweaks without becoming unreadable:
# automation.yaml (conceptual structure)
version: 1
defaults:
timezone: "America/New_York"
dry_run: false
limits:
max_actions_per_run: 200
max_actions_per_target: 5
sources:
crm:
enabled: true
lookback_days: 7
rules:
- name: "High-value lead follow-up"
when:
lead_score_gte: 80
last_contact_days_gte: 2
then:
action: "notify"
target: "sales-triage"
template: "lead_followup_v1"
targets:
sales-triage:
type: "slack"
channel: "#sales-triage"
templates:
lead_followup_v1:
text: "Follow up with {{lead_name}} (score {{lead_score}}). Last touch: {{last_contact_date}}."
Notice what this structure does well:
- Stable top-level keys (defaults, sources, rules, targets, templates) make it navigable.
- Rules are declarative: “when X, then Y” rather than embedded logic.
- Targets are named and referenced by ID, so you can change a channel without touching every rule.
- Limits live in defaults, creating a single place to enforce safety.
What this structure intentionally does not do: support nested if-else chains, arbitrary expressions, or user-defined functions. Those features look powerful but usually create a second programming language that is harder to test than your actual code.
Validation and guardrails (so config changes do not break production)
A config-driven bot is only safe if it refuses to run on bad config. Treat configuration as untrusted input, even if it lives in your repository. Small mistakes like a typo in a target name can silently route messages into the void unless you validate early.
A practical validation strategy
- Schema validation: required fields, allowed types, and allowed enums (for example, target.type must be slack or email).
- Cross-reference validation: every rule target must exist in targets; every template must exist in templates.
- Constraint validation: enforce numeric bounds (max_actions_per_run between 1 and 1000) and safe defaults.
- Behavior validation: detect “too broad” rules (for example, a rule with no when clause) and require explicit acknowledgement.
Pair this with two operational guardrails:
- Dry run mode: compute intended actions and log them, but do not perform side effects.
- Rate limits and caps: even with correct config, upstream data can spike; caps prevent a flood.
If you only implement one thing, implement “fail closed”: on any validation error, stop before sending notifications or writing data.
A concrete example: an onboarding follow-up bot
Imagine a small SaaS team with a simple automation: after a user signs up, check whether they completed setup within 48 hours. If not, notify the customer success channel with a short summary so someone can reach out.
The first version hardcodes everything: the 48-hour window, the channel, and the message text. Within a month, the requests start:
- Enterprise trials should wait 24 hours, not 48.
- Users in a specific region should route to a regional channel.
- Do not notify outside business hours.
- Change the wording to reduce back-and-forth questions.
With configuration-first design, you keep the code stable: fetch signups, compute elapsed time, evaluate rules, send notifications. The config expresses the business differences. For example, you add two rules (enterprise and standard), add a business-hours window in defaults, and map region IDs to targets. The team can review a single YAML diff that clearly shows “we changed enterprise wait time from 48 to 24” without reading code.
This style also supports gradual improvement. You can introduce a new template version, point one rule at it, and compare how it performs in practice without any branching in your script.
Copyable checklist: ship a config-driven automation
Use this checklist when converting a script from hardcoded values to configuration-first design.
- Define the stable core: list the steps that should rarely change (fetch, filter, act, log).
- Identify “policy” knobs: thresholds, destinations, schedules, templates, toggles.
- Choose a config structure with named objects (targets/templates) instead of repeating fields in many rules.
- Add defaults for common settings (timezone, limits) and document them in comments near the file.
- Implement validation: schema, cross-references, bounds, and “too broad” detection.
- Add safety caps: max actions per run and per target, plus a dry run mode.
- Log config version and rule name with every action for traceability.
- Test with a small dataset and confirm dry run output matches expectations.
- Review the diff: ensure changes read like business intent, not like code.
Common mistakes and how to avoid them
- Turning YAML into a programming language: If you need expressions, loops, or nested conditional trees, move that logic into code and expose only the parameters.
- No validation: YAML that “loads” is not necessarily correct. Validate references and bounds before doing any work.
- Secrets in config: API keys and tokens belong in a secret manager or environment variables, not in a repo file.
- Inconsistent naming: Use stable IDs for targets and templates (sales-triage, lead_followup_v1). Avoid “Channel1” style names.
- Missing safety limits: A single bad rule can send thousands of messages if you do not cap actions per run.
- No audit trail: Log which rule fired, which config version ran, and what action was taken. Debugging without this is guesswork.
When not to use configuration-first
Configuration-first is not a default requirement. It is a tradeoff that adds structure and some upfront effort. Skip it, or keep it minimal, when:
- The automation is truly one-off and will be thrown away after a single run.
- The policy will not change (for example, a fixed nightly export with a stable destination).
- The “rules” require heavy computation that is easier to express and test in code than in a declarative format.
- You have no safe deployment path for config changes (no reviews, no validation, no way to dry run). In that case, adding a config file can increase risk.
A middle ground works well: start with a few config values (targets, thresholds) and expand only when change pressure appears.
Conclusion
Configuration-first automation is a simple pattern with outsized impact: it keeps small bots adaptable while protecting reliability. Put business decisions in YAML, keep execution logic in code, validate aggressively, and enforce safety caps. The result is an automation that can evolve through small, reviewable changes instead of risky rewrites.
FAQ
Is YAML always better than JSON for configuration?
Not always. YAML is easier for many people to read and annotate, but JSON is stricter and can be simpler to validate in some environments. Pick the format your team can reliably edit and review. The structure and validation matter more than the file extension.
How do I prevent non-engineers from accidentally breaking the bot?
Use validation plus a dry run mode, and require changes to be reviewed like code. The goal is not to block edits, but to make unsafe edits fail fast with clear error messages.
Where should feature toggles live?
If toggles are business-driven (enable a rule, disable a source), they fit well in config. If toggles are engineering-driven (switch an HTTP client implementation), keep them in code or deployment settings.
How big can a single config file get before it becomes unwieldy?
When a single file stops being reviewable. As a practical signal, if changes routinely touch unrelated sections, consider splitting by domain (per team, per workflow, or per environment) while keeping shared defaults consistent.