technical

Geographic Labyrinth, Part 2: Handshakes Up, Routing Down — Three Bugs That Bypassed the Chain

Part 2 of the Geographic Labyrinth series. Part 1 — A WireGuard Pinball Machine on a Single Box is the design overview and predicted footguns.

TL;DR

Six-container chain came up. WireGuard handshakes worked everywhere — wg show clean on all hops, encrypted UDP visible on every eth0 pcap. traceroute against 1.1.1.1 from every container produced "[SUCCESS]" output that looked great until I actually read it: hop 1 was 172.20.0.1 (Docker bridge) on every single trace, then straight to my real Spectrum upstream. The chain was never in the path.

Three independent root causes, all in the routing layer, any one of them sufficient to break it:

  1. Every middle hop's "next" peer listed only the downstream 10.0.x.0/24 subnets in AllowedIPs — never 0.0.0.0/0 — so WireGuard's cryptokey router had no match for 8.8.8.8 and the packet fell back to the container's default route (the Docker bridge).
  2. The configure-hop.sh startup script installed default via $NEXT_HOP dev wg0 metric 100 where $NEXT_HOP was the Docker bridge IP of the next container (172.20.0.x), not the wg-side IP (10.0.x.1). And at metric 100, that route lost to Docker's default at metric 0 anyway.
  3. rp_filter strict mode was still default in some containers, which would have dropped the asymmetric return path even if 1+2 had been fixed.

This post is the diagnosis arc — pcap forensics, the three findings, and the one-shot patch that fixes all three. Runtime end-to-end verification is the next session.

What I built (the architecture that was actually deployed)

I ended up with the simpler of the two patterns in my notes: one wg0 per container, multiple peers. WireGuard's longest-prefix cryptokey routing does the work of picking which peer to send a packet to. The alternative pattern — two interfaces per container (wg-in + wg-out) with per-hop Table = N policy routing — is more elegant for hops that also serve local traffic, but I didn't need that this round.

Topology:

client → primary → russia → canada → taiwan → australia → exit → Docker bridge → home gateway → ISP

container       eth0 (Docker)   wg0 address    listen   role
oblivion-primary    172.20.0.10    10.0.1.1/24   51820    entry
oblivion-russia     172.20.0.20    10.0.2.1/24   51821    middle hop
oblivion-canada     172.20.0.30    10.0.3.1/24   51822    middle hop
oblivion-taiwan     172.20.0.40    10.0.4.1/24   51823    middle hop
oblivion-australia  172.20.0.50    10.0.5.1/24   51824    middle hop
oblivion-exit       172.20.0.60    10.0.6.1/24   51825    exit (MASQ to eth0)

All six containers ran the oblivion-router:dev image built from upstream DBA1337TECH/OblivionEdge (Rust on a PREEMPT_RT Alpine kernel, FIPS-hardened OpenSSL, WireGuard tunnel layer; license clarified by the maintainer as Public Domain — credit is requested, not required). Compose pinned every container's IPv4 on a geo-labyrinth bridge so the chain endpoints were deterministic.

What worked out of the box

The control plane was healthy. From inside oblivion-russia:

interface: wg0
  public key: f2iFp/7XVn8KCPMn3v9jfN9OtG3yvfwvI++uNyjqLFs=
  private key: (hidden)
  listening port: 51821

peer: 6khhZIkNkTL0weBUu9izS9x/n6n/kGKD1dkWePuHNhY=     # primary
  endpoint: 172.20.0.10:51820
  allowed ips: 10.0.1.0/24
  latest handshake: 51 seconds ago
  transfer: 1.23 KiB received, 1.29 KiB sent
  persistent keepalive: every 25 seconds

peer: zQNt9sqvyf+xnLID1aHAFEcOmSlqt8IJtYFWKEiRhD8=     # canada
  endpoint: 172.20.0.30:51822
  allowed ips: 10.0.3.0/24, 10.0.4.0/24, 10.0.5.0/24, 10.0.6.0/24
  latest handshake: 52 seconds ago
  transfer: 1.20 KiB received, 1.32 KiB sent
  persistent keepalive: every 25 seconds

Recent handshakes on both peers, transfer counters incrementing, persistent-keepalive bytes accounting for the handshake noise. Encrypted Handshake Initiation + Transport Data packets visible on every container's eth0 pcap in Wireshark, dissected cleanly as WireGuard protocol frames.

That part is real. The keys are right, the namespaces are right, the inter-container UDP reachability is right, the image build is right. If you stopped here and said "I built a multi-hop WireGuard chain," you'd be ~30% honest — the cryptography is doing what it should. The routing is what isn't.

What didn't work — the pcap that ended the celebration

The smoking gun was the oblivion-primary capture on its eth0. Primary is supposed to send all its traffic encrypted to russia (172.20.0.20:51821), and only the encrypted UDP to russia. Instead the pcap showed:

9   0.042963   172.20.0.10  → 104.26.12.205   TCP    [SYN]   to api.ipify.org:443
13  0.060859   172.20.0.10  → 104.26.12.205   TLSv1.3 Client Hello (SNI: api.ipify.org)
15  0.088128   104.26.12.205 → 172.20.0.10    TLSv1.3 Server Hello, ...

Primary's container hit ipify.org over plain TCP — no WireGuard encapsulation, no tunnel — straight out the bridge to my real ISP. The handshake UDP to russia was also present, churning along on its own, completely disconnected from the actual data path.

The oblivion-exit capture confirmed it from the other end:

16  0.952835   8.8.8.8       → 172.20.0.60   ICMP    Echo reply  ttl=113
20  1.449226   208.67.222.222 → 172.20.0.60  ICMP    Echo reply  ttl=52
22  1.454327   8.8.8.8       → 172.20.0.60   ICMP    Echo reply  ttl=113

ttl=113 is what Google's edge stamps with default TTL 128 minus ~15 hops over the real public internet. If this packet had taken the labyrinth, the TTL would be 113 minus another five hops of WireGuard transit. The exit container is hitting the real internet directly — not even pretending the chain exists.

And the canonical traceroute, which the test harness gleefully prints [SUCCESS] after, every hop:

docker exec oblivion-primary traceroute -n -m 10 8.8.8.8
 1  172.20.0.1     0.003 ms   ← Docker bridge
 2  192.168.86.1   34.687 ms  ← my home router
 3  192.168.2.42   39.242 ms  ← Spectrum CPE
 4  76.186.208.1   46.425 ms  ← Spectrum edge
 5  24.28.134.33   34.583 ms
 6  24.27.13.230   19.689 ms
 7  24.27.13.234   25.630 ms
 8  24.175.51.176  17.374 ms
 9  24.175.32.156  22.671 ms
10  *  *  *

Identical traceroute from every container in the chain. Nobody's traffic ever touched wg0. The [SUCCESS] tag in the harness output means traceroute exited 0 — it does not mean the path went through any tunnel. That's a measurement bug in the test harness as much as a routing bug in the chain.

Diagnosis: three independent root causes

#1: AllowedIPs never includes 0.0.0.0/0 anywhere in the chain

WireGuard's data plane is cryptokey routing: each peer claims a set of AllowedIPs, and a packet only enters that peer's tunnel if its destination matches. There is no default route via WireGuard — there is only "longest-prefix match across all my peers' AllowedIPs, on this interface."

The deployed wg0-primary.conf:

[Peer]   # russia
PublicKey = f2iFp/7XVn8KCPMn3v9jfN9OtG3yvfwvI++uNyjqLFs=
AllowedIPs = 10.0.2.0/24, 10.0.3.0/24, 10.0.4.0/24, 10.0.5.0/24, 10.0.6.0/24

A packet to 8.8.8.8 matches none of those prefixes. WireGuard refuses to put it in the tunnel. The kernel then walks the rest of the routing table, finds default via 172.20.0.1 dev eth0, ships the packet straight to Docker's bridge gateway. This single bug is enough to bypass the entire chain.

Same pattern in every other hop's "next" peer entry — narrow 10.0.x.0/24 lists, never the catch-all. The chain was wired for return paths but never for outbound.

The fix is a single principle: on each middle hop, the peer toward the internet gets AllowedIPs = 0.0.0.0/0; the peer toward the client gets just the upstream tunnel /24. Longest-prefix match ensures the narrow peer wins for return packets to 10.0.x.0/24, so the 0.0.0.0/0 on the forward direction is safe — it only catches what doesn't match anything more specific.

After the patch:

# wg0-russia.conf
[Peer]   # primary (return path only)
AllowedIPs = 10.0.1.0/24
[Peer]   # canada (catch-all -- the chain forwarder)
AllowedIPs = 0.0.0.0/0

#2: The default-route override was malformed

configure-hop.sh tried to take over the container's default route:

ip route add default via ${NEXT_HOP} dev wg0 metric 100

Two compounding errors:

  • ${NEXT_HOP} was the Docker bridge IP of the next container (e.g., 172.20.0.30 for russia → canada). That's not on wg0wg0's subnet is 10.0.3.0/24. The route reads as "default via 172.20.0.30 dev wg0", which the kernel accepts but can't actually use; there's no path from wg0 to a 172.20.0.0/16 address. The correct gateway is the wg-side IP of the next hop: 10.0.3.1.
  • Even with the correct gateway, metric 100 is worse than the Docker default's metric 0, so the kernel keeps using the Docker bridge.

The fix moves default-route management out of the startup script and into each wg0.conf's PostUp:

PostUp = ip route del default 2>/dev/null || true; \
         ip route add default via 10.0.2.1 dev %i metric 50

Lower metric, correct gateway, and wg-quick tears it down cleanly on shutdown via the matching PostDown.

#3: rp_filter would have dropped the return path

Strict-mode reverse-path filtering (net.ipv4.conf.all.rp_filter=1, the default on most distros) rejects packets arriving on an interface that the kernel wouldn't have used to send a reply. In a chain, return traffic naturally arrives on a different interface than the one that sent the request, so strict mode silently drops everything. The handshake survives (it's symmetric), so wg show keeps lying to you that everything's fine.

This one didn't manifest in the pcaps yet because bugs #1 and #2 prevented any chain traffic from ever existing to be filtered. But the moment those are fixed, this becomes the next thing to break — so the patch sets rp_filter=2 (loose) on all, default, eth0, and wg0 of every container.

Latent: exit's FORWARD policy

configure-exit.sh does iptables -P FORWARD DROP and then only allows ESTABLISHED,RELATED. New outbound connections initiated from the chain (which is most useful traffic) would never get an ESTABLISHED state to match. Patched by adding iptables -A FORWARD -i wg0 -o eth0 -j ACCEPT in exit's PostUp, before the policy drop ever applies (explicit ACCEPT rules win over policy).

The patch

I wrote a one-shot bash script that:

  • Extracts the existing PrivateKeys and peer PublicKeys from the current wg0-*.conf files (no key rotation, no hardcoded values)
  • Backs up every wg0-*.conf and configure-hop.sh to *.bak.<timestamp>
  • Regenerates each wg0-*.conf from a template with:
    • Correct asymmetric AllowedIPs (narrow return, broad forward)
    • PreUp sysctls for ip_forward and rp_filter=2 on all/default/eth0
    • PostUp for rp_filter=2 on %i, FORWARD ACCEPT, POSTROUTING MASQUERADE, default-route override via the wg-side gateway at metric 50
    • Matching PostDown for clean teardown
  • Comments out the broken default-route block in configure-hop.sh with a PATCHED-BY: marker so re-runs are idempotent
  • Leaves the exit container's default route alone (real internet exits through its eth0) and adds an explicit FORWARD -i wg0 -o eth0 -j ACCEPT

Generated wg0-russia.conf after patch:

[Interface]
PrivateKey = EGj95eYBcuTmElWPddkWQw9rymzQi6jVf8X5ibcXKVU=
Address = 10.0.2.1/24
ListenPort = 51821
PreUp = sysctl -w net.ipv4.ip_forward=1; \
        sysctl -w net.ipv4.conf.all.rp_filter=2; \
        sysctl -w net.ipv4.conf.default.rp_filter=2; \
        sysctl -w net.ipv4.conf.eth0.rp_filter=2
PostUp = sysctl -w net.ipv4.conf.%i.rp_filter=2; \
         iptables -A FORWARD -i %i -j ACCEPT; \
         iptables -A FORWARD -o %i -j ACCEPT; \
         iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE; \
         ip route del default 2>/dev/null || true; \
         ip route add default via 10.0.3.1 dev %i metric 50
PostDown = iptables -D FORWARD -i %i -j ACCEPT; \
           iptables -D FORWARD -o %i -j ACCEPT; \
           iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE; \
           ip route del default via 10.0.3.1 dev %i metric 50 2>/dev/null || true

[Peer]   # primary (return path only)
PublicKey = 6khhZIkNkTL0weBUu9izS9x/n6n/kGKD1dkWePuHNhY=
AllowedIPs = 10.0.1.0/24
Endpoint = 172.20.0.10:51820
PersistentKeepalive = 25

[Peer]   # canada (catch-all -- chain forwarder)
PublicKey = zQNt9sqvyf+xnLID1aHAFEcOmSlqt8IJtYFWKEiRhD8=
AllowedIPs = 0.0.0.0/0
Endpoint = 172.20.0.30:51822
PersistentKeepalive = 25

The full script is in the repo and runs against any directory containing the wg0-*.conf set plus scripts/configure-hop.sh.

What I expect to see after the patch

Validation traceroute from the entry container, in order:

docker exec oblivion-primary traceroute -n -m 8 1.1.1.1
 1  10.0.2.1   ← russia (wg)
 2  10.0.3.1   ← canada (wg)
 3  10.0.4.1   ← taiwan (wg)
 4  10.0.5.1   ← australia (wg)
 5  10.0.6.1   ← exit (wg)
 6  172.20.0.1 ← Docker bridge (exit's real next hop)
 7+ <real WAN hops from exit's perspective>

If hop 1 is still 172.20.0.1, the AllowedIPs = 0.0.0.0/0 change didn't take effect — usually because wg-quick reloads the interface without re-reading the conf. The fix is the heavy hammer:

docker exec oblivion-primary wg-quick down wg0
docker exec oblivion-primary wg-quick up wg0

For independent confirmation that the path is real and not just cosmetic, capture on exit's eth0 in parallel with a curl from primary:

# Terminal A
docker exec oblivion-exit tcpdump -i eth0 -nn -c 20 'tcp port 443'

# Terminal B
docker exec oblivion-primary curl -s https://api.ipify.org

If the chain is real, the SYN to api.ipify.org (104.26.x.x) should be sourced from 172.20.0.60 (exit's bridge IP), not from 172.20.0.10 (primary's). That's the proof that traffic actually transited five WireGuard tunnels before leaving the box.

Update 2026-06-09: runtime-verified the pattern, with corrections

Built a 3-hop variant of this on Docker Desktop (Mac as entry, 2 containers as middle+exit) to runtime-verify the patched config from this post. Three more bugs surfaced that aren't in the original patch:

  1. PostUp = sysctl -w net.ipv4.conf.%i.rp_filter=2 fails. /proc/sys is read-only inside an unprivileged Docker container; even per-interface writes for wg0 (which exists post-ip link add) get rejected. The fix is to set net.ipv4.conf.default.rp_filter=2 in the compose sysctls: block — newly-created interfaces inherit from .default, and MAX(.all=2, wg0)=2 covers the rest. No PostUp sysctl needed.

  2. ip route add default dev %i metric 50 doesn't replace the bridge default. The Docker bridge default is at metric 0 (kernel default). Lower metric wins, so adding a metric-50 route leaves the bridge default in place and the chain is still bypassed. The fix: ip route replace default dev %i with no explicit metric.

  3. Table = off + replacing the default route creates a routing loop for WG control-plane traffic. Middle's own WG handshake replies to the upstream peer (Mac, in the runtime test) also go through dev wg0 → get caught by cryptokey routing → re-encrypted into the chain → upstream never sees the response. wg-quick's Table = auto solves this with fwmark, but Table = auto runs the doomed net.ipv4.conf.all.src_valid_mark sysctl. The fix is to replicate the fwmark split manually in PostUp:

    PostUp = wg set %i fwmark 51820
    PostUp = ip rule add not fwmark 51820 table 51820
    PostUp = ip rule add table main suppress_prefixlength 0
    PostUp = ip route add default dev %i table 51820
    

    WG-emitted UDP carries the mark → main-table default via eth0 (control plane works). Decapsulated forwarded chain traffic doesn't carry the mark → custom table 51820 → default dev wg0 (chain forwarding works). No loop.

  4. **\-continuation in wg-quick PostUp lines is silently broken.** Every PostUp/PreUp in the patch above uses ; \ continuation, but wg-quick (Alpine 3.21, wireguard-tools v1.0.20210914) reads line-by-line and does NOT join \-continued lines. The second-and-after lines become bogus keys and wg setconf aborts. The fix is multiple PostUp = ... lines, each one full command.

The full corrected, runtime-verified reference implementation lives at dotfiles/macos/wg-labyrinth — clean up / down lifecycle, fwmark routing baked in, all the gotchas above encoded in its template. A distilled config-template + gotcha-catalog version is published at snippet #6 for readers who want a stable URL reference without cloning the dotfiles repo.

The wiki page WireGuard in Containers has been updated to reflect all four corrections in its "Default route override" and "Reverse-path filter" sections.

What I still need to verify

Honest scope of "fixed":

  • Verified by inspection: the patched configs have correct AllowedIPs, the default-route override uses the right gateway and a better metric, rp_filter=2 is set everywhere, the exit FORWARD rule is in place.
  • Not yet verified at runtime: that the chain actually carries packets end-to-end with the patches applied. The pcaps I have are from the broken state. Next session: redeploy with patched configs, capture again, confirm a curl from primary egresses out exit's eth0.
  • Open question: the PrivateKey = $(cat router_configs/keys/exit.key) placeholder in the original wg0-exit.conf was preserved verbatim by my patcher. If whatever envsubst step the deployment uses isn't running, wg-quick up wg0 in the exit container will refuse with "key must be 32 base64-decoded bytes". Easy to spot in docker logs oblivion-exit; replace with the real key before bringing up.
  • Cosmetic limitation: even with the patch working, the traceroute will show the wg-side IPs (10.0.x.1) of each hop, not the simulated country IPs. That's the ICMP TTL-exceeded responder daemon from Part 1's footgun list, which is still Part 3 work — a Scapy script that listens for ICMP echo with low TTLs and replies with Time exceeded from the assigned country prefix.

What Part 1's predictions got right and wrong

Looking back at the five footguns I called out before the build:

Predicted Verdict
Unprivileged container sysctls ✓ Correct — sysctls: block in compose was the right pattern, no /proc/sys writes attempted
rp_filter drops packets ✓ Correct in principle — didn't manifest yet because bigger bugs prevented chain traffic from existing
SNAT requirement Partial — MASQUERADE works for the exit, but the per-hop SNAT to fake country IPs is still Part 3 work
Traceroute responder Still pending — see "cosmetic limitation" above
MTU stacking Not measured yet — no chain traffic to measure

What I didn't predict: the AllowedIPs catch-all gotcha and the malformed default route. Those aren't classical WireGuard pitfalls — they're "I wired the cryptokey routing for return paths and forgot the forward direction" plus "I conflated the Docker bridge IP with the wg-side IP." Both feel obvious in hindsight; neither was obvious from the existing literature on multi-hop WireGuard, which mostly assumes you're using wg-quick's AllowedIPs = 0.0.0.0/0 shortcut and never thinks about what happens if you don't.

The diagnostic that finally cracked it was reading the actual reply TTLs in the exit pcap — ttl=113 was unmistakably "real internet" not "5-hop chain." Once I saw that, the rest was config archaeology.

Lessons that generalize

  1. wg show healthy + traceroute exit 0 ≠ chain is working. Both the control plane and the test harness can lie to you. Look at the actual TTL in the first reply packet on the exit's egress interface.
  2. Cryptokey routing has no implicit default. Anything not in AllowedIPs doesn't get encrypted — it gets sent in the clear via the host's default route. A chain that never lists 0.0.0.0/0 anywhere never carries internet-bound traffic.
  3. ${NEXT_HOP} is dangerously ambiguous in a multi-network setup. It can mean "the next hop's docker bridge IP" or "the next hop's tunnel IP" — and they're different. Name your variables NEXT_HOP_BRIDGE_IP and NEXT_HOP_TUNNEL_IP from day one.
  4. Test harnesses should validate the path, not just the syscall. A traceroute "succeeded" means traceroute exited cleanly. A traceroute that took the path you wanted requires asserting on the actual hops, not on the return code.

What's next (Part 3 territory)

  • Bring the patched chain up and run the proof-of-life validation
  • Per-hop SNAT to the actual country prefixes (5.101.x for Russia, 174.95.x for Canada, etc.) — currently the chain only proves transit, not the geographic illusion
  • The Scapy ICMP TTL-exceeded responder so traceroute actually shows the labyrinth instead of * * *
  • MTU measurement under load with ping -M do -s N
  • The host-side fwmark + ip rule selective entry, so only marked traffic from the host enters the labyrinth and everything else continues direct
  • WireGuard in Containers — the canonical operator reference. After this session I tightened the asymmetric-AllowedIPs section, added a "default route override is your responsibility" subsection, and added a diagnostic recipe that catches exactly the silent-bypass mode this post documents.
  • VPN Recommendation — where self-hosted multi-hop sits next to Mullvad/IVPN/Tor.

References from this session

0 Comments

← Back to all posts