Skip to content

Config File

sznuper uses a single YAML config file. By default it looks for:

  • ~/.config/sznuper/config.yml (as user)
  • /etc/sznuper/config.yml (as root)

Override with --config <path>. Environment variables are supported anywhere in the file using ${VAR_NAME} syntax.

The config has four top-level sections:

Paths for healthcheck storage, caching, and logs.

options:
healthchecks_dir: /etc/sznuper/healthchecks
cache_dir: /var/cache/sznuper
logs_dir: /var/log/sznuper

All fields are optional. Defaults depend on whether sznuper runs as root or user.

FieldRoot defaultUser default
healthchecks_dir/etc/sznuper/healthchecks~/.config/sznuper/healthchecks
cache_dir/var/cache/sznuper~/.cache/sznuper
logs_dir/var/log/sznuper~/.local/state/sznuper/logs

Arbitrary key-value pairs accessible in notification templates. Useful for shared values like hostname.

globals:
hostname: my-server
environment: production

Notification services using Shoutrrr URLs. Each service has a name, a URL, and optional default params.

The service name is arbitrary - it’s just a label you use to reference it later in alerts. You can have multiple services of the same type with different names:

services:
telegram-ops:
url: telegram://${TELEGRAM_TOKEN}@telegram
params:
chats: ${OPS_CHAT_ID}
telegram-alerts:
url: telegram://${TELEGRAM_TOKEN}@telegram
params:
chats: ${ALERTS_CHAT_ID}
discord:
url: discord://${DISCORD_TOKEN}@${DISCORD_WEBHOOK_ID}

Then reference them by name in your alerts:

alerts:
- name: disk
...
notify:
- telegram-ops
- discord

A list of alerts. Each alert defines what to check, when to check it, and who to notify. Example:

alerts:
- name: disk
healthcheck: https://github.com/sznuper/healthchecks/releases/download/v0.4.0/disk_usage
sha256: abc123...
triggers:
- interval: "5m"
timeout: "30s"
args:
mount: /
threshold_warn_percent: 80
threshold_crit_percent: 95
template: "Disk usage on {{globals.hostname}}: {{event.usage_percent}}%"
cooldown: "1h"
notify:
- telegram
FieldRequiredDescription
nameyesUnique name for this alert
healthcheckyesURI of the healthcheck (file://, https://, or builtin://)
sha256noSHA-256 hash for remote healthchecks, or false to skip verification
triggersnoList of triggers (see below)
timeoutnoMax execution time (e.g. "30s")
argsnoKey-value arguments passed as HEALTHCHECK_ARG_* env vars
side_effectsnoShell commands to run after event processing
templateyesGo template for the notification message (see below)
cooldownnoSuppress repeated notifications (e.g. "5m", "1h")
notifyyesList of services to notify
eventsnoPer-event-type configuration (see below)

A list of triggers. Each alert can have multiple triggers and they all run independently. Example:

triggers:
- interval: "5m"
- cron: "0 9 * * 1"
- cron: "0 18 * * *"

This alert would run every 5 minutes, every Monday at 9am, and every day at 6pm.

Available trigger types:

TypeDescription
intervalRun on a fixed interval (e.g. "5m", "30s")
cronCron expression, 5 or 6 fields (e.g. "0 9 * * *")
watchRun when a file changes (e.g. /var/log/app.log)
pipeContinuous shell command whose stdout is fed to the healthcheck (e.g. "tail -F /var/log/app.log")
lifecycleSpecial trigger that fires on daemon start and stop. Only works with the builtin://lifecycle healthcheck.

Templates use Go’s text/template syntax with Sprig functions. Four scopes are available:

ScopeDescription
eventFields from the healthcheck output (e.g. {{event.type}}, {{event.usage_percent}})
globalsValues from the globals config section (e.g. {{globals.hostname}})
alertAlert metadata (e.g. {{alert.name}})
argsArguments from the alert’s args field (e.g. {{args.mount}})

Example:

template: |-
[{{event.type | upper}}] {{globals.hostname}}:
Disk {{args.mount}} at {{event.usage_percent}}% ({{event.available}} remaining)

A list of services to notify. In the simplest form, just the service name:

notify:
- telegram
- discord

You can also override params per notification. The params are merged on top of the service’s base params - any key you set here wins over the service default. Params are passed as query parameters in the Shoutrrr URL.

notify:
- telegram
- telegram:
params:
chats: ${ANOTHER_CHAT_ID} # override the default chat
notification: "false" # send silently
- discord:
params:
username: sznuper-bot
avatar_url: https://example.com/avatar.png

This sends to the default telegram chat, a second telegram chat silently, and discord with a custom bot name and avatar.

Each alert has template, notify, and optionally cooldown that apply to all event types by default. The events section lets you override these per event type.

FieldDescription
healthyList of event types considered healthy. When sznuper sees a healthy event after unhealthy ones, it resets cooldowns.
on_unmatchedWhat to do with event types not listed in override: "notify" (default) or "drop".
overridePer-event-type overrides for template, cooldown, and notify.

For example, say you have a disk usage alert with a default cooldown of "1h" and a simple template. But for critical_usage events you want a more urgent message, a shorter cooldown, and to also notify discord:

alerts:
- name: disk
healthcheck: ...
triggers:
- interval: "5m"
template: |-
[{{event.type | upper}}] {{globals.hostname}}:
Disk at {{event.usage_percent}}%
cooldown: "1h"
notify:
- telegram
events:
healthy:
- ok
on_unmatched: notify
override:
critical_usage:
template: |-
CRITICAL: {{globals.hostname}} disk at {{event.usage_percent}}%!
Only {{event.available}} remaining on {{args.mount}}
cooldown: "5m"
notify:
- telegram
- discord

Here, ok and high_usage events use the alert-level defaults (1h cooldown, telegram only). But critical_usage gets its own template, a 5m cooldown, and notifies both telegram and discord.

options:
healthchecks_dir: /etc/sznuper/healthchecks
cache_dir: /var/cache/sznuper
logs_dir: /var/log/sznuper
globals:
hostname: my-server
services:
telegram:
url: telegram://${TELEGRAM_TOKEN}@telegram
params:
chats: ${TELEGRAM_CHAT_ID}
alerts:
- name: disk
healthcheck: https://github.com/sznuper/healthchecks/releases/download/v0.4.0/disk_usage
sha256: abc123...
triggers:
- interval: "10m"
args:
mount: /
threshold_warn_percent: 80
threshold_crit_percent: 95
template: |-
Disk usage on {{globals.hostname}}
Mount: {{event.mount}}
Usage: {{event.usage_percent}}%
Available: {{event.available}}
cooldown: "1h"
notify:
- telegram
events:
healthy:
- ok
- name: ssh
healthcheck: https://github.com/sznuper/healthchecks/releases/download/v0.4.0/ssh_journal
sha256: abc123...
triggers:
- pipe: "journalctl -fu sshd --output=json"
template: |-
SSH {{event.type}} on {{globals.hostname}}
User: {{event.user}}
Host: {{event.host}}
notify:
- telegram
events:
override:
login:
cooldown: "0"
failure:
cooldown: "5m"