WireGuard as the Backbone of SD-WAN Tunnels
Every Hopbox SD-WAN device establishes encrypted tunnels back to our hub infrastructure. These tunnels carry POS transactions, CCTV backhaul, management traffic, and DNS queries. The tunnel protocol is the foundation of the entire overlay network.
We evaluated IPsec, OpenVPN, and WireGuard. WireGuard won decisively. This post explains why, and how we manage WireGuard at the scale of 900+ sites.
The Contenders
Section titled “The Contenders”IPsec is the enterprise standard. IKEv2 + ESP provides strong encryption, broad interoperability, and decades of battle-testing. It is also complex.
An IPsec tunnel involves:
- IKE Phase 1: negotiate a security association (SA) for the control channel
- IKE Phase 2: negotiate SAs for the data channel
- Multiple cipher suites to configure and agree upon
- Certificate or PSK authentication
- NAT-T (NAT Traversal) encapsulation if either side is behind NAT
On an embedded OpenWrt device with limited CPU, IPsec’s encryption overhead is noticeable. The strongSwan userspace daemon handles IKE negotiation, and ESP processing happens in the kernel — but the handshake latency and rekeying overhead add up when you are managing hundreds of tunnels.
# IPsec connection establishment (typical)# IKE_SA_INIT: ~50ms RTT# IKE_AUTH: ~50ms RTT# CREATE_CHILD_SA: ~50ms RTT# Total setup: ~150-300ms (varies with network conditions)## Plus: periodic rekeying every 1-8 hours depending on configOpenVPN
Section titled “OpenVPN”OpenVPN is the FOSS community’s default choice. It is well-understood, runs in userspace, and is easy to configure. The problem is performance.
OpenVPN processes every packet in userspace. On a 100 Mbps link, OpenVPN on a typical embedded device maxes out at 30-50 Mbps — the CPU becomes the bottleneck, copying packets between kernel and userspace and encrypting them in OpenSSL.
# OpenVPN throughput on typical embedded hardware# (ARM Cortex-A53, quad-core, 1.2GHz)## OpenVPN with AES-256-GCM: ~35 Mbps# OpenVPN with ChaCha20: ~40 Mbps# Native throughput (no VPN): ~940 Mbps## OpenVPN uses ~80% CPU at these speedsOpenVPN also has a slow handshake — TLS negotiation over TCP or UDP takes multiple round trips, and reconnection after a link flap is not instant.
WireGuard
Section titled “WireGuard”WireGuard runs entirely in the kernel. There is no userspace daemon processing packets. The wg interface is a kernel network interface — packets go in, get encrypted by ChaCha20-Poly1305, and come out the other side. No context switches, no copies between userspace and kernel.
# WireGuard throughput on the same embedded hardware# (ARM Cortex-A53, quad-core, 1.2GHz)## WireGuard with ChaCha20-Poly1305: ~250 Mbps# Native throughput (no VPN): ~940 Mbps## WireGuard uses ~25% CPU at 250 MbpsThe numbers speak for themselves. WireGuard delivers 5-7x the throughput of OpenVPN on the same hardware, at a fraction of the CPU cost.
Why WireGuard Won
Section titled “Why WireGuard Won”1. Performance
Section titled “1. Performance”Covered above. On embedded devices where CPU is the constraint, kernel-space processing is not optional — it is necessary. Our devices need to push encrypted POS, CCTV, and management traffic simultaneously. Every CPU cycle spent on VPN overhead is a cycle not available for QoS processing, firewalling, or monitoring.
2. Simplicity
Section titled “2. Simplicity”A WireGuard configuration is a handful of lines:
[Interface]PrivateKey = <device_private_key>Address = 10.200.42.2/32MTU = 1420
[Peer]PublicKey = <hub_public_key>Endpoint = hub-west-01.hopbox.in:51820AllowedIPs = 10.200.0.0/16, 10.100.0.0/16PersistentKeepalive = 25Compare that to an IPsec ipsec.conf or an OpenVPN .ovpn file. WireGuard’s configuration surface is small enough that we can generate, validate, and audit it programmatically with confidence.
3. Roaming and Reconnection
Section titled “3. Roaming and Reconnection”WireGuard is stateless from the network perspective. There is no “connection” to establish or maintain. If a device’s WAN IP changes (common with 4G failover), WireGuard simply updates the endpoint when the next authenticated packet arrives. There is no renegotiation, no handshake — just a seamless transition.
This is critical for SD-WAN. When a device fails over from fiber to 4G, its public IP changes. With IPsec or OpenVPN, that means a reconnection delay. With WireGuard, the hub sees packets arriving from a new source IP, verifies the cryptographic authentication, and updates its endpoint record. Traffic continues to flow.
# WireGuard peer status showing roamingroot@hub-west-01:~# wg show wg0 peerspeer: <device_public_key> endpoint: 49.36.xx.xx:34892 # <-- this updates automatically on failover allowed ips: 10.200.42.2/32 latest handshake: 12 seconds ago transfer: 1.24 GiB received, 892.45 MiB sent4. Cryptographic Simplicity
Section titled “4. Cryptographic Simplicity”WireGuard uses a fixed set of modern primitives: ChaCha20-Poly1305 for symmetric encryption, Curve25519 for key exchange, BLAKE2s for hashing, SipHash for hashtable keys. There is no cipher negotiation, no downgrade attacks, no configuration knob for choosing between AES-128, AES-256, 3DES, or RC4.
This is a feature, not a limitation. Cipher negotiation is a source of bugs and misconfiguration in IPsec and TLS. WireGuard’s approach means every tunnel uses the same strong cryptography — no exceptions.
WireGuard on OpenWrt
Section titled “WireGuard on OpenWrt”OpenWrt has first-class WireGuard support via the kmod-wireguard kernel module and the wireguard-tools package.
# Install on OpenWrtopkg updateopkg install kmod-wireguard wireguard-tools
# OpenWrt UCI configuration (equivalent to the wg config above)uci set network.wg0=interfaceuci set network.wg0.proto='wireguard'uci set network.wg0.private_key='<device_private_key>'uci set network.wg0.listen_port='51820'uci add_list network.wg0.addresses='10.200.42.2/32'uci set network.wg0.mtu='1420'
uci add network wireguard_wg0uci set network.@wireguard_wg0[-1].public_key='<hub_public_key>'uci set network.@wireguard_wg0[-1].endpoint_host='hub-west-01.hopbox.in'uci set network.@wireguard_wg0[-1].endpoint_port='51820'uci add_list network.@wireguard_wg0[-1].allowed_ips='10.200.0.0/16'uci add_list network.@wireguard_wg0[-1].allowed_ips='10.100.0.0/16'uci set network.@wireguard_wg0[-1].persistent_keepalive='25'
uci commit network/etc/init.d/network reloadThe UCI integration means WireGuard configuration is managed alongside all other network configuration on the device — same syntax, same commit/apply model, same Ansible modules.
Key Management at 900+ Sites
Section titled “Key Management at 900+ Sites”WireGuard’s simplicity has one major operational implication: key management is your problem.
Each device has a unique Curve25519 key pair. Each hub has a key pair. Every device-to-hub relationship requires the hub to know the device’s public key, and the device to know the hub’s public key. At 900+ sites with multiple hubs, that is thousands of key pairs to generate, distribute, and rotate.
Key Generation Pipeline
Section titled “Key Generation Pipeline”Keys are generated during device provisioning — never on the device itself (to avoid weak entropy on embedded hardware).
# Key generation in the provisioning pipelinewg genkey | tee /tmp/device_private.key | wg pubkey > /tmp/device_public.key
# Private key goes into the device config (encrypted at rest)# Public key goes into the hub config and the inventory database# Simplified Ansible template for key distribution- name: Generate WireGuard keypair for new device command: wg genkey register: wg_private_key delegate_to: localhost no_log: true
- name: Derive public key shell: "echo '{{ wg_private_key.stdout }}' | wg pubkey" register: wg_public_key delegate_to: localhost no_log: true
- name: Store keys in vault ansible.builtin.include_role: name: vault_store vars: site_id: "{{ inventory_hostname }}" private_key: "{{ wg_private_key.stdout }}" public_key: "{{ wg_public_key.stdout }}"Automated Key Rotation
Section titled “Automated Key Rotation”WireGuard does not have built-in key rotation. The tunnel uses the configured key pair until you change it. For a security-conscious deployment, we rotate keys on a regular schedule:
- Provisioning server generates a new key pair for the device
- New public key is pushed to the hub configuration
- New private key is pushed to the device
- Device applies new config — tunnel re-establishes with new keys
- Old key is removed from the hub after confirming the new tunnel is active
# Key rotation: add new key to hub before removing old# This ensures zero-downtime rotation
# Step 1: Add new peer entry on hub (new public key)wg set wg0 peer <new_device_pubkey> allowed-ips 10.200.42.2/32
# Step 2: Push new private key to device, reload# (via Ansible / management plane)
# Step 3: Verify tunnel is active with new keywg show wg0 | grep -A4 <new_device_pubkey># latest handshake should be recent
# Step 4: Remove old peer entry from hubwg set wg0 peer <old_device_pubkey> removeTunnel Monitoring
Section titled “Tunnel Monitoring”Every WireGuard tunnel is monitored via Prometheus. The key metric is latest handshake — WireGuard performs a handshake every 2 minutes if there is traffic, or whenever a keepalive fires (every 25 seconds in our config).
# prometheus-wireguard-exporter metricswireguard_latest_handshake_seconds{interface="wg0",public_key="<key>"} 1711452823wireguard_received_bytes_total{interface="wg0",public_key="<key>"} 1334567890wireguard_sent_bytes_total{interface="wg0",public_key="<key>"} 987654321Alert rule:
- alert: WireGuardTunnelDown expr: time() - wireguard_latest_handshake_seconds > 300 for: 2m labels: severity: critical annotations: summary: "WireGuard tunnel to {{ $labels.public_key }} has not had a handshake in 5+ minutes"If the latest handshake is more than 5 minutes old, the tunnel is effectively dead — the device is unreachable, likely due to a WAN outage or a deeper issue.
MTU Considerations
Section titled “MTU Considerations”MTU misconfiguration is the most common cause of “tunnel is up but traffic is broken” issues. WireGuard adds overhead to every packet:
Outer UDP header: 8 bytesWireGuard header: 32 bytesPadding: 0-15 bytes━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Total overhead: 40-47 bytes (typically 60 with outer IP header)If the path MTU is the standard 1500 bytes, the inner MTU must be set to 1420 (1500 - 80 bytes of overhead) or lower. If the WAN link has a lower MTU (common with PPPoE at 1492), the inner MTU drops further.
# Set WireGuard interface MTUip link set wg0 mtu 1420
# For PPPoE uplinks (MTU 1492):ip link set wg0 mtu 1412
# Verify: clamp TCP MSS to matchiptables -t mangle -A FORWARD -o wg0 -p tcp --tcp-flags SYN,RST SYN \ -j TCPMSS --clamp-mss-to-pmtuWe set MSS clamping on every device to avoid TCP blackhole issues where large packets are silently dropped because they exceed the path MTU and DF (Don’t Fragment) is set.
Hub Architecture
Section titled “Hub Architecture”The tunnel endpoints are not a single server. Our hub infrastructure is distributed across regions:
┌──────────────┐ │ hub-north │ │ (Delhi) │ └──────┬───────┘ │ ┌──────────────┐ │ ┌──────────────┐ │ hub-west │───────┼───────│ hub-east │ │ (Mumbai) │ │ │ (Kolkata) │ └──────────────┘ │ └──────────────┘ │ ┌──────┴───────┐ │ hub-south │ │ (Chennai) │ └──────────────┘
Each hub: WireGuard endpoint + routing + PowerDNS resolver Devices connect to nearest regional hub Hubs mesh with each other for inter-region trafficEach device connects to its nearest regional hub. If a hub goes down, devices can be redirected to an alternate hub — the management plane pushes a config update changing the WireGuard endpoint, and the device re-establishes the tunnel.
Summary
Section titled “Summary”WireGuard gives us:
- 5-7x throughput vs OpenVPN on the same embedded hardware
- Instant roaming when devices fail over between WAN links
- Minimal configuration surface — fewer moving parts, fewer misconfigurations
- Kernel-space processing — no userspace bottleneck, predictable latency
- Modern cryptography with no cipher negotiation or downgrade risk
The trade-off is that WireGuard does not handle key distribution, authentication infrastructure, or certificate management for you. We built that layer ourselves with Ansible and our provisioning pipeline. For a deployment at our scale, that trade-off is worth it — we would rather own the key management problem explicitly than wrestle with IPsec’s IKE complexity or OpenVPN’s TLS overhead.
WireGuard is not a complete SD-WAN solution. It is the tunnel primitive on top of which we build routing, failover, QoS, and monitoring. But it is the right primitive — fast, simple, and reliable.