Final Report: Simplify LibreMesh and get it closer to OpenWrt

This summer I focused on removing LibreMesh-specific glue where OpenWrt already has a solid upstream solution, and on tightening integration with OpenWrt services:

  • Task 1 — Reboots via OpenWrt’s watchcat: new lime-hwd-watchcat module generates /etc/config/watchcat from LibreMesh UCI and reloads the service automatically. It replaces the custom deferrable-reboot.
  • Task 2 — Move DHCP to odhcpd + cluster-wide lease sharing: prototype package that makes odhcpd the main DHCPv4 server (maindhcp=1) and shares leases across nodes via CRDTs, using ubus and leasetrigger.
  • Task 3 — Remove VLAN wrappers from Babel: PR switches lime-proto-babeld to run on base ifaces and br-lan (DSA), adds a small nftables guard to avoid L2 multicast leaking Babel traffic from bat0 into the bridge.
  • Task 4 — Plan for an L3-only LibreMesh profile: opened an L3-only mesh issue (no batman-adv/anygw) to scope the work and collect feedback.

Project goals

  • Simplify LibreMesh by adopting OpenWrt-native components when they are equivalent or better.
  • Reduce maintenance burden in LibreMesh packages by deleting code where upstream covers it.
  • Keep behavior observable & testable with CI/unit tests and simple field tests on real routers.

What was built

1) Integrate OpenWrt’s watchcat via LibreMesh HWD

Why: OpenWrt ships watchcat for scheduled reboots/network checks. Using it upstream avoids reboot logic and gains a familiar UI.

What: lime-hwd-watchcat (Lua) reads LibreMesh UCI (config hwd_watchcat), generates the corresponding /etc/config/watchcat sections, cleans stale ones, and reloads the service. The package lives in lime-packages and targets stock OpenWrt watchcat.

Pull request of this task

Code of this task

Result: Reboots and connectivity checks are handled by upstream watchcat, visible in LuCI, with configuration owned by LibreMesh.

2) Replace dnsmasq DHCP with odhcpd

Why: odhcpd is OpenWrt’s native DHCP/RA daemon, well-integrated with ubus, and already authoritative for IPv6; enabling it for IPv4 simplifies the stack.

What: prototype package shared-state-odhcpd_leases:

  • UCI defaults set dhcp.odhcpd.maindhcp=1, point leasetrigger to our publisher, and register a community scoped CRDT.
  • Publisher: on lease change, call ubus dhcp ipv4leases, distill to IP → {mac, hostname}, publish to CRDT.
  • Generator: on CRDT updates, atomically write /tmp/ethers.mesh and reload odhcpd so foreign leases are served locally too.
    (“Leasetrigger” behavior is an odhcpd feature used here to emit on lease changes.)

In the field: two LibreMesh nodes converge on the same lease table within seconds; uci show dhcp reflects maindhcp=1. Tech references for odhcpd and UCI options: docs & source.

Pull Request of this task

3) Remove VLAN interfaces from Babel (run on base ifaces / br-lan)

Why: Historically LibreMesh created per-radio VLANs for Babel. On modern DSA targets this adds complexity and hides interface intent. Running babeld on br-lan (wired) and radios directly is simpler and yields the right link metrics by default.

What: rewrites lime-proto-babeld to:

  • Configure Babel on base ifaces and add a br-lan interface (type=wired).
  • Drop the legacy “VLAN-on-wlan” indirection.
  • Add a small nftables ingress guard on bat0 to prevent Babel L2 multicast flooding into br-lan, which could otherwise create “ghost wired neighbors”.
    (Babel uses UDP 6696 and link-local multicast ff02::1:6 / 224.0.0.111; the guard drops those on bat0.)

Notes: The PR targets DSA (not swconfig). It also documents what to add in lime-node to enable babeld proto explicitly. See PR conversation, examples and nftables snippet.

Pull Request of this task

4) Toward an L3-only LibreMesh profile

What: Today LibreMesh typically mixes BATMAN-adv (L2) and Babel (L3): BATMAN-adv makes the whole mesh look like one big Ethernet (single broadcast domain), while Babel does hop-by-hop IP routing. An L3-only profile would switch fully to Babel for mesh connectivity, drop BATMAN-adv and anygw, and give each node its own LAN prefix.

I opened an issue to scope the work: goals, migration path, and checks the community can run while we iterate.

  • Before (BATMAN-adv + anygw): clients across the mesh often see the same IPv4 gateway address; the mesh feels like one flat LAN.
  • After (L3-only): each node runs a DHCP server for its /24 (or /64 in IPv6). Clients connected to node A live in A’s subnet; traffic to node B’s subnet routes via Babel.

Issue of this task

Minimum plan

  • Drop L2 mesh: exclude BATMAN-adv bits (lime-proto-batadv, ubus-lime-batman-adv) and remove anygw.
  • Keep it pure L3: enable Babel (lime-proto-babeld) on mesh radios and on br-lan. (Related: my PR removes the legacy VLAN-on-wlan layer so Babel runs directly on the base ifaces.)
  • Per-node addressing: use LibreMesh’s main_ipv4_address/main_ipv6_address patterns to give every router its own LAN prefix.
  • DHCP/RA: ensure a DHCP/RA server per node.
  • Gateways: if you want “plug Internet anywhere,” pair L3-only with babeld-auto-gw-mode so default routes are advertised only when WAN is healthy.

Current state (as of Aug 25, 2025)

  • watchcat integration: packaged in lime-packages (app + docs page) and usable today; config lives in LibreMesh UCI, rendering to standard OpenWrt watchcat.
  • odhcpd migration: working prototype with CRDT lease sharing; evaluation and broader testing tracked in Issue #1199.
  • Babel changes: PR #1210 open with review/approvals from maintainers.
  • L3-only profile: Issue #1211 open for community discussion and follow-ups.

What’s left / next steps

  • watchcat: collect feedback from communities and tune sane defaults per device profiles.
  • odhcpd: expand tests (IPv6 RA/DHCPv6 + v4 coexistence), harden failure modes, and benchmark lease propagation on dense meshes. Track in #1199.
  • Babel PR #1210: finalize DSA docs, test swconfig fallback or explicitly gate to DSA-only in code, and validate the bat0 ingress guard on more topologies.
  • L3-only: produce a minimal profile, migration guide (no batman-adv, no anygw).

Lessons learned

  • Upstream keeps projects healthy. Using existing, maintained components cuts long-term maintenance and aligns you with a larger ecosystem.
  • Small, reviewable changes earn fast feedback. I saw how clear PR descriptions, minimal diffs, and reproducible test notes make reviews smoother, because maintainers can reason about the change quickly.
  • Communication is part of the code. Good issues/PRs explain why, not just what; they show logs, decisions, and trade offs.
  • Deleting code is a feature. Replacing custom code with upstream tools (watchcat instead of deferrable-reboot or odhcpd instead of dnsmasq-DHCP) taught me that reducing surface area often improves reliability.
  • Open Source Community. I’ve learned a lot of how open source community works, everyone prioritizes that the other person can understand and use the code in a simple way.

Other links references

Project page

Midterm Blog

Do we need VLAN for Babeld interfaces? (Mailing list discussion)

Conclussion

This summer I worked and learned a lot. I’m proud to have met my tutors’ expectations and contributed to a project like LibreMesh.

I want to thank my mentors Ilario and Javier for trusting and giving me this opportunity, especially their patience and compromise, they were amazing!

Leave a Reply

Your email address will not be published. Required fields are marked *