GSoC 2024 Qaul: Qaul BLE module for Linux: Part-III

GSOC'2024: Qaul BLE module for linux

This post is the final post in the continuation of this blog series. In part 2, I discussed the theory behind Qaul BLE support for Linux and a walkthrough for creating a BLE GATT server, advertisement of services, and scanning of devices advertising qaul services. We also discussed the theory of message transfer through BLE GATT characteristics. Please refer to that blog before proceeding.

In this blog, we will explore the implementation and low-level design of the BLE support in greater depth.

Correction in the previous Blog

In the previous blog, while creating the GATT server, we defined two services:- “main_service” and “msg_service” with their respective characteristics. The issue here is, that multiple services lead to longer advertisement messages that are only supported by extended advertisements. So, for compatibility, we are using only “main_service“, which encapsulates both characteristics:- “mainChar“, and “msgChar” for normal advertisements. Rest all things are same.

Introduction

We will begin by configuring services and characteristics of the qaul device to send libqaul messages to other devices and spawn a new thread for receiving messages from all nearby devices. For byte-loss transfer, we are enclosing our data inside delimiters “$$” and then breaking it into byte arrays. On the spawned message listener, we receive messages from all nearby qaul devices and separate the messages into a map of type <qaul_id, message>. The message starts at “$$” and ends when ending “$$” is encountered. The message is then sent to libqaul for decoding.

Message receiver

The message is received by “msgChar” when some other device overwrites the value of this characteristic. While configuring msgChar, we created a msg_charcteristic_ctrl which emits an event, CharacteristicControlEvent::Write(write), io_write by characteristic. On accepting the write request into read_buffer which is then sent to ble_message_sender for further processing of bytes. Ble_message_reciever receives the byteArray in the set of 20 and maintains a map of qaul_id v/s message received till now. As soon as ending $$ is encountered, the message is sent to libqaul.

The whole above architecture works by spawning two new threads, one for receiving data from other devices and the other for manipulation of received bytes, for loss-less transfer of bytes.

Message Sender

On discovering a nearby qaul device emitting the same main_service and read_char as any qaul device, the libqaul starts to send routing messages at regular intervals for other devices to get updated on routing tables. All other public or personal messages are also transmitted similarly.

On receiving a “DirectSend” request from libqaul, ble_module adds delimiters(“$$”) to the start and end of the message. Then it breaks the message into arrays of 20 or fewer bytes each and pushes each into a queue. The Gatt client then connects to the specified server and tries to io_write the message into the value of the characteristic of the server which will eventually trigger the Write event of characteristic_ctrl.

Also, while writing data to another device, the GATT client also updates its list for last_found_time for out_of_range_checker.

Result :

We were successful in creating a reliable connection between Android 33+ and Linux devices with nearby device discovery and message transfer using ble_module.

Device discovered by Android ble_module.

Device discovered by Linux ble_module.

Messages are Successfully sent and received by both devices.

Images for BLE Message from Android to Linux

Message sent by Android.

Message received by Linux.

Images for BLE Message from Linux to Android

Message sent by Linux.

Message received by Android.

Limitations:

The above implementation works well for any Linux device with Bluetooth Adapter version >= 5.0 and Android SDK version >= 33. The absence of any of the above conditions would lead to the loss of some random bytes from the message leading to the failure of this protocol.

Conclusion

The implementation of the Bluetooth Low Energy module using Bluez is limited in extensibility and reliability, especially within the planned architecture. The instability of Linux Bluetooth can cause message loss if conditions aren’t met, indicating a need for further optimization of the BLE module and the underlying protocol to enhance robustness and reliability.

While working on the Qaul project and implementing the ble module, I learned a lot about peer-to-peer communication, routing, and Bluetooth in general.

I would like to express my sincere gratitude to my mentors, Mathias Jud and Breno, for allowing me to participate in GSoC 2024 and for their invaluable guidance throughout this project. I am also grateful to Andi and all the Freifunk members involved with GSoC for making this project possible.

This marks the end of my GSoC’2024 project, but as I mentioned earlier, there is still work to be done. If you have any questions, please feel free to reach out. I hope you found this project as rewarding and enjoyable as I did!

You can refer to PR for reference.

GSoC 2024: Development of a Modular Assistant for OpenWrt final report

Introduction

With the GSoC 2024 edition coming to an end, I want to share the final report of my project, which I have been working on for the past few months. My project involved the development of a GUI wizard to simplify the initial OpenWrt configuration. The main objective was to create an intuitive and easy-to-use tool that allows users, regardless of their technical level, to configure their OpenWrt router quickly and safely. This report details the achievements, challenges, and current status of the project, as well as the next steps that could be taken to improve the tool.

Project Objectives

1. Develop an Intuitive User Interface

Create a GUI that is easy to use, regardless of user knowledge, to guide users through the initial configuration steps.

2. Implement Essential Configuration Functionalities

Add the basic settings needed to get an OpenWrt device up and running, such as administrator security, Internet connection, wireless configuration, and activation of additional services.

3. Optimize for Diverse Devices and Usage Scenarios

Ensure that the GUI works efficiently on a wide range of OpenWrt-compatible devices.

Technical Development and Implementation

1. UI Design and Architecture

The interface was designed with usability and accessibility in mind. The UI was designed with a modular approach, allowing for future expansion and customization with relative ease.

Key UI Elements:

  • Step-by-Step Navigation: Each step of the setup process is presented in a linear sequence, allowing users to move forward and back as needed.
  • Real-Time Validations: Validations have been implemented to ensure that entered data, such as passwords and network settings, meet required security standards and formats.
  • Responsive Design: The interface adapts to different screen sizes, which is crucial for devices with user interfaces of various sizes, from small routers to tablets or remote access devices.

2. Implemented Configuration Features

a) Administrator Security:

  • Password Setting: A field has been included for the user to define the administrator password during the first configuration. To improve security, the interface requires the password to be at least 10 characters long, including uppercase and lowercase letters, numbers, and symbols.
  • Dynamic Validation: As the user types the password, password strength indicators are displayed, providing instant feedback on its security.

b) Internet Connection Configuration:

  • Connection Type Selection: The interface allows you to select between several connection types: DHCP, Static IP, and PPPoE.
  • Dynamic Fields: Depending on the connection type selected, different fields are displayed. For example, for a static IP connection, the IP, subnet mask, gateway, and DNS servers are requested, while for PPPoE, the username and password provided by the ISP are required.
  • Auto-Detection: A feature for automatic detection of the connection type, based on the WAN port response, has been implemented, helping less experienced users to select the correct option without needing to know the technical details.

c) Wireless Network (Wi-Fi) Settings:

  • SSID and Password: Users can define the Wi-Fi network name (SSID) and set a strong password for the network. Similar to the administrator password, the Wi-Fi password must meet certain security criteria.
  • Encryption Types: The option to select the network encryption type has been included, with recommendations to use WPA3, the most secure standard currently available. WPA2 support is also provided for older devices.
  • Mesh Networking Configuration: The option to configure mesh networks has been integrated, allowing users to expand their wireless network coverage using multiple OpenWrt devices.

d) Additional Services:

  • VPN: The option to activate a VPN server is offered, providing secure access to the network from remote locations. Basic configuration guides have been included for OpenVPN and WireGuard, the two most commonly used protocols in OpenWrt.
  • DHCP Server: Users can enable or disable the DHCP server on their network, and configure the range of IP addresses to be assigned.

Next Steps

  • Multilingual Support: Expand language support to make the wizard accessible to a broader audience globally.
  • Advanced Settings: Include more options for advanced users, such as custom firewall, and more details in VPN configuration.
  • Documentation and Support: Create more detailed documentation and user guides, including video tutorials, to help new users get familiar with the wizard and OpenWrt capabilities.

Challenges and Lessons Learned

Balancing Simplicity and Functionality

One of the key challenges was to design an interface that was simple enough for beginner users, but at the same time offered advanced options for more experienced users, so basic settings were implemented with advanced options accessible in additional sections.

Improving User Experience

Throughout development, I learned the importance of constant feedback from third parties such as mentors or outsiders who were essential in adjusting and improving the interface, ensuring it met community expectations.

Conclusion

Working on this project as part of GSoC 2024 has been an incredibly rewarding experience.

I want to thank my mentors for all their help and guidance. I also want to thank the GSoC team for providing me with this opportunity for growth and learning.

GSoC 2024: eBPF performance optimizations for a new OpenWrt Firewall, Final report

Hello again everybody! With GSoC 2024 coming to an end, it is time to present you my final blog post/report for my project, showing you what I have achieved during the summer.

Project Goal

This project aims to introduce a new firewall software offloading variant to OpenWrt by intercepting an incoming data packet from the NIC as early as possible inside or even before the network stack through the eBPF XDP or TC hook. After that, the packet might be mangled (e.g., NAT) and redirected to another network interface or dropped.

The result should be that we see a performance increase, either by having a higher throughput, dropping packets faster, or lowering the overall CPU load.

More detailed descriptions of this project can be found in my first blog post here and my midterm update here.

What I did

To achieve the goals of this project, I had to design and implement three parts of the software:

  • The eBPF program which intercepts, mangles, and forwards or drops incoming data packets
  • A user-space program that attaches the eBPF program, reads Firewall rules, and makes routing decisions for the received packets
  • An eBPF hashmap for the communication between the eBPF- and user-space program

Finally, a performance evaluation is required to compare the results of this eBPF implementation against OpenWrt’s current Firewall.

You can find my implementation, measurement scripts, and some plots in my dedicated GitHub repository here: https://github.com/tk154/GSoC2024_eBPF-Firewall

The current implementation state

eBPF kernel-space program

When the eBPF kernel-space program receives a packet, it parses the layer 2, 3, and 4 packet headers, and if a DSA switch receives the packet, it also parses the DSA tag. If any header is unknown in the respective layer, it passes the packet to the network stack. The following headers/DSA tags are currently supported:

Layer 2Layer 3Layer 4DSA Tags
Ethernet
802.1Q VLAN
PPPoE
IPv4
IPv6
TCP
UDP
mtk
gswip
qca

It then checks inside the eBPF hashmap what to do with the received packet:

  • If there is no entry yet for the packet/flow, it creates a new one to signal the user-space program.
  • If there is an entry but the eBPF program should not redirect the packet, the packet is passed to the network stack or dropped.
  • If there is an entry and the eBPF program should redirect the packet, it …
    • Mangles the packet by applying NAT (if applicable) and adjusting the TTL and checksums
    • Pushes the Ethernet header and possible additional L2 header onto the packet
    • Sends the packet out of the designated network interface

User-space program

When the user starts the binary, it attaches the eBPF program to the lowest possible network interfaces on the system or to all network interfaces given by the user per command line argument.

It then loops every n seconds through the flow entries of the eBPF hashmap and checks via nf_conntrack whether a connection tracking exists for that flow.

  • If so, and if the flow entry is new, it …
    • Retrieves possible NAT information via nf_conntrack
    • Makes the routing decision
    • Checks if the eBPF program needs to push layer 2 headers
    • Determines the next hop via rtnetlink
    • Saves all that information inside the eBPF map to signal the eBPF program that it can take over now
  • For all existing flow entries, it updates the nf_conntrack timeouts as long as an established connection tracking entry exists
  • If a connection tracking entry does not exist, it checks Firewall rules via OpenWrt’s scripting language ucode if the eBPF program should drop the packet.

When a configurable flow timeout occurs, the user-space program deletes the flow entry from the eBPF map.

What is left to do

Submitting an XDP generic patch

Currently, for XDP generic, if the pre-allocated SKB does not have a packet headroom of 256 Bytes, it gets expanded, which involves copy operations consuming so many CPU cycles that the hoped-for performance gain is negated. I have already created a patch that makes the XDP generic packet headroom configurable, but I still need to submit it to upstream Linux.

Routing changes

When there is a new flow entry, the user-space program makes a routing decision and stores the result inside the eBPF map. But it could be possible that such a route changes now, e.g. because the user explicitly changed it or a network interface went down. The user-space program doesn’t react yet to routing changes, which means that the eBPF program still forwards packets to the old routing destination.

Counter updates

As soon as the eBPF program starts forwarding packets, network interface and nf_conntrack counters aren’t updated anymore. Updating the interface counters shouldn’t be a problem, but in my testing, nf_conntrack counter updates seem to get ignored from user-space.

Performance results

Similar to my first blog post, I tested the throughput performance on an AVM FRITZ!Box 7360 v2 running OpenWrt with Linux Kernel version 6.6.41, which CPU is too weak to saturate its Gigabit ports. I used iperf3 to generate IPv6 UDP traffic for 60 seconds where NAT is applied for the source and destination IPs and ports; you can find the results inside the following plot:

The parts are the following:

  • default: OpenWrt’s Firewall running but without any offloading enabled
  • sw_flow: Netfilter’s software offloading enabled (flow_offloading)
  • xdpg256: The eBPF program attached to the XDP generic hook with the default package headroom of 256 Bytes
  • xdpg32: The eBPF program attached to the XDP generic hook with a custom package headroom set to 32 Bytes
  • tc: The eBPF program attached to the TC hook
  • xdpg32_dsa: The eBPF program attached to the XDP generic hook of the DSA switch with a custom package headroom set to 32 Bytes
  • tc_dsa: The eBPF program attached to the TC hook of the DSA switch

Unfortunately, there is no performance gain when using the XDP generic mode with the default 256 Bytes packet headroom. TC is on the same level as Netfilter’s software offloading implementation. XDP generic with the custom 32 Bytes packet headroom is around 50 MBit/s faster.

The actual performance gain comes into play when attaching the eBPF program to the DSA switch. While XDP generic with 256 Bytes packet headroom is now at least faster than without offloading, XDP generic with 32 Bytes packet headroom is about 250 MBit/s faster than any other offloading, which means about 50% more throughput. TC is also a little bit faster, but there is not such a performance increase as for XDP.

I have created the following graphs using the Linux command line tool perf and scripts from the FlameGraph repository. They show how many CPU cycles Linux kernel functions used for the OpenWrt Firewall running without any offloading and the XDP generic with 32 Bytes packet headroom attached to the DSA switch.

As you can see, since the eBPF program saves some Linux kernel function calls, the CPU can poll for more data via the xrx200_poll_rx function, which consequentially benefits the throughput performance.

Soon, I will also upload the graphs for the other measured parts and the package dropping performance into my already mentioned GitHub repository.

Concluding thoughts

While implementing this new Firewall offloading variant, I learned a lot of new things, not just about eBPF but also about the Linux kernel and network stack itself. Although it was not always easy, because I had to delve into Netlink first, for example, I also had much fun while coding.

As I have shown, the performance gain is somewhat mixed compared to OpenWrt’s current Firewall. To have a higher throughput, my XDP generic patch would need to be accepted for the upstream Linux kernel.

Finally, I would like to thank my mentor, Thomas, for giving me the chance to participate in GSoC 2024, and, the same goes for the OpenWrt core developer, Felix, for guiding me through the project. Furthermore, I appreciate that Andi and all GSoC involved Freifunk members make it possible to participate in such a project.

This concludes my GSoC 2024 project, but as I already mentioned, there is still some work to do. Should you have questions, do not hesitate to contact me. I hope you enjoyed the project as much as I did!

GSoC 2024: Visualise community data, final report

My project on community data visualisation is soon coming to a close; it’s time to review the work and what the outcomes were. The main goal of the project was to

build a [data] pipeline from the JSON directory through to some visualisations on a web page.

With some caveats, we’ve successfully achieved that. I made a tool (json_adder) which reads the JSON files into a MongoDB database, with a set of resolvers which provide a query interface, and finally some graphs which call the query interface and render the data using the d3 javascript library.

Beyond that, GraphQL was supposed to allow anyone else to

query the data for their own purposes.

I haven’t achieved this, partly due to the limits of GraphQL, which is better designed to service web apps than free-form queries. Queries are flexible only to the extent that they’re defined in the schema. Secondly, all of this is still only deployed locally, although it’s ready to deploy to production when needed.

At least the most difficult part of the project is over, the infrastructure is all there, and now the barrier to entry is much lower for anyone wanting to make a visualisation. New queries / visualisations can be contributed back to Freifunk’s vis.api.freifunk.net repository on Github.

Graphs

Top ten communities with the most nodes

This graph is very simple, top of the list here is Münsterland with 3,993 nodes at the time of writing. This doesn’t quite tell the whole story, because unlike city-based communities in this lineup, the Münsterland nodes are spread far and wide around Münster itself. Nevertheless, congratulations to Münsterland for the excellent network coverage!

Sources: MongoDB query, d3 code.

This bar graph was the first one I made, as a simple proof of concept. Since this worked well enough, I moved on to a line graph example, something which could take advantage of the time-series data we have.

Growth of the network over time

This graph shows the sum total of nodes across the network per month, from 2014-2024. The number of nodes grew rapidly from 2014-2017 before tapering off into a stable plateau. The high point ran through September and October 2019, with around 50,700 nodes at the peak.

Sources: MongoDB query, d3 code.

For some context on this curve, Freifunk as a project began in 2004, and only started collecting this API data in 2014. Part of the initial growth in nodes could be accounted for by new communities slowly registering their data with the API.

The query behind this graph became too slow after I’d imported the full 2014-2024 hourly dataset into MongoDB.1 Daily data was more performant, but was still unnecessarily granular, so what you see here is grouped by daily-average per month.

For shorter time periods, the data is still there to query by day or by hour, and this is something which could be worked into an interactive visualisation.

Routing protocols in use across the network

The next thing we wanted to see was the distribution of routing protocols. This graph is similar to the above, it counts the protocols used by each community every day, and averages the total each month.

Sources: MongoDB query, d3 code.

This graph doesn’t have a legend (#57), and the colours change on every page load (#56). I don’t know why the colours change like this, is this a common thing with d3? If anyone can resolve this, please help!

In the meantime, I have copied the d3 swatches example, to make a custom legend for the above graph.

We see that batman-adv is by far the most popular protocol, followed by 802.11s, which appears in the dataset in Duisberg in August 2015.

Remaining work

Everything here is all working locally, which leaves the main remaining task to try to deploy it somewhere for public use. We always had in mind that this would have to be deployed somewhere for public use eventually. So, thankfully we’ve already done some of this work and most of the services involved here can be easily packaged up and deployed to production. Andi is going to investigate webpack for bundling the javascript, trying to avoid having to procure (and especially maintain) another server.

Outstanding issues

  • Make graphs interactive (#47).
  • Provide snippets that can be embedded in community web pages (#49).

There isn’t a Github issue for this, but the graph page is really a test page with only enough boilerplate HTML to show the graphs. The page would benefit from some styling, a small amount of CSS to make it look more… refined.

Visualisation ideas

  • A word cloud of communities, scaled by node count (#46).
  • A heatmap of communities across the network, showing geographical concentrations of nodes (#41).
  • Hours of the day with most nodes online (#12).

Things I learned

This was my first time building a traditional web service end to end, from the database all the way through to a finished graph. In the end, the final three graphs do not look individually impressive, but a lot of work went into the process.

I took the opportunity to write the data loading code in Rust, and while I am proud of that, it took me a long time to write and wasn’t easy for others to contribute to. Andi rewrote this data loading code in Python much more quickly, and then made useful additions to it. Coding in a well-understood language like Python had advantages for project architecture which outweighed whatever technical arguments could be had around Rust.

In a similar vein, I learned how to write GraphQL schemas and queries, and arrived at the conclusion that GraphQL is not ideally suited for the kind of public query interface that I had in mind. The happy side of this lesson was that I ended up relying on MongoDB functions, and became better at writing those instead.

  1. As an aside, importing the full decade’s worth of hourly data made my (high end workstation) desktop computer really sweat. It was very satisfying to start the import, hear the fans on my computer spinning up to full speed, and physically feel the weight of all that processing. ↩︎

GSoC 2024: Development of a Modular Assistant for OpenWrt Update

Project Objectives

What has been achieved in this first half?

The goal is to design an OpenWrt configuration wizard to simplify the device configuration process, especially for those users who do not have deep technical knowledge.

1. Improve UI:

A clean, modern and easy-to-use user interface was developed for everyone. It allows you to follow a step-by-step process to configure the device with clear and well-defined options, with intuitive navigation between steps, allowing users to move back and forth with ease.

2. Detailed Configuration Steps

Step 1: Language Selection

Users can choose the language of their preference for the wizard, thus allowing a better understanding of the process.

Step 2: Security

At this stage, users can enter the device name and set administrator passwords. Validations have been implemented to ensure that passwords meet security requirements: at least 10 characters, including numbers, symbols, and a combination of upper and lower case letters.

Step 3: Internet Connection

Here you select the type of Internet connection you want to configure:

– DHCP: The router automatically obtains the IP address and other network configuration parameters from the Internet Service Provider (ISP), simplifying the configuration process.

– Static IP: Allows users to manually enter the IP address, subnet mask, gateway and DNS servers, useful for networks that require specific configurations or when using a fixed IP address.

– PPPoE: Primarily used in DSL connections, it requires the user to enter a username and password provided by the ISP.

Step 4: Wireless Configuration

At this stage, users can configure their router’s wireless network:

– SSID: The name of the Wi-Fi network that devices will see when searching for available networks.

– Wi-Fi Password: Users can set a password for their Wi-Fi network, with security validations similar to administrator passwords.

– Wireless Encryption Type: We have implemented the selection of the encryption type to improve network security.

– Mesh Network: Users can configure mesh networks to expand the coverage of their Wi-Fi network, improving connectivity in large areas.

Step 5: Additional Services

In this section additional services such as longitude, latitude or activating options are enabled:

– VPN: Allows users to securely connect to the local network from remote locations.

– DHCP: Allows the router to automatically assign IP addresses to devices on the local network.

Step 6: Summary

The last step that users encounter is a confirmation summary of the process choices.

Importance of the Advances Made

Flexibility and Security

Allowing selection of wireless encryption type is crucial for users to secure their network according to their specific needs and device compatibility. WPA3, for example, offers significantly improved security compared to WEP.

Easy to use

The new interface and step-by-step navigation simplify the setup process, making it accessible even to those without deep technical knowledge. This lowers the barrier to entry and allows more people to use and benefit from OpenWrt.

Mesh Network Configuration

Integrating the mesh networking option expands network coverage, improving connectivity over large areas and providing a more consistent and reliable user experience.

Next steps

Integration of More Options for Additional Services

Add additional services to add more functionality.

User Interface Optimization:

Modify the user interface based on the feedback received to make it as easy and intuitive as possible.

Exhaustive Testing

Perform tests to ensure the operation and stability of the assistant.

GSoC 2024: eBPF performance optimizations for a new OpenWrt Firewall, Midterm update

Hello again, everybody! This is the Midterm follow-up blog post for my GSoC 2024 topic: “eBPF performance optimizations for a new OpenWrt Firewall.” It will cover how I started trying to solve the task, what the current implementation looks like, and what I will do in the upcoming weeks.

As a quick reminder: The project’s goal is to implement a new OpenWrt Firewall offloading variant using eBPF. Why eBPF? Because with eBPF, you can intercept an incoming data packet from the NIC very soon inside or even before the Linux network stack. After intercepting the packet with an eBPF program at the so-called XDP or TC hook, you can mangle it, redirect it to another or out of the same network interface, or drop it. Mangling the packet could mean, for example, applying possible Network Address Translation (NAT), adjusting the Time-To-Live (TTL), or recalculating the checksum(s).

The result should be that we see a performance increase, either by having a higher throughput, dropping packets faster, or lowering the CPU load.

Current implementation

The implementation consists of three components:

  • The eBPF program which intercepts, mangles, and forwards or drops incoming data packets from a network interface
  • A user-space program that attaches the eBPF program to the appropriate network interfaces and determines whether to forward a received packet and where to
  • An eBPF map (in this case, a key-value hash map) so that the eBPF and user-space program can communicate with each other

Originally, I wanted to parse all OpenWrt Firewall rules and dump them into the eBPF map when the user-space program starts. When the eBPF program received a packet, it would try to match it with one of the parsed rules. But I had a few talks with the OpenWrt community and my mentor and concluded that this approach poses some problems:

  1. eBPF has limited looping support, but for rule matching, it is necessary to loop.
  2. OpenWrt uses the Netfilter framework as its firewall backend that has (too) complex features to implement in eBPF, like for example the logging of packets.

That is why we decided to go for a “flow-based” approach. When the eBPF program receives a packet, it creates a tuple from some crucial packet identifiers (Ingress interface index, L3 and L4 protocols, and source and destination IPs and ports). The program uses this tuple as the key for the eBPF hash map to signal the user-space program that it has received a packet for a new flow so that it can look up what the eBPF program should do with packets for that particular flow.

Until the user-space program responds, the eBPF program passes all packets belonging to that flow to the network stack, where the Netfilter framework processes it for now. In the meantime, the user-space program checks what the eBPF program should do with packets from that flow and stores the result inside the hash map as the value.

Connection Tracking must also be available because the to-be-implemented offloading variant should be stateful instead of stateless. I first thought about implementing it in the eBPF or user-space program. But then I realized I would somewhat reinvent the wheel because OpenWrt uses the Netfilter framework, which has a connection tracking implementation called nf_conntrack.

The Netfilter project provides an API through their user-space library libnetfilter_conntrack to add, retrieve, modify, and delete connection tracking entries. I am using this API in my implementation to check whether a conntrack entry exists for a packet flow. In the case of TCP, it only forwards packets while a connection is in the “Established” state so that Netfilter can still handle the opening and closing states of the TCP connections. In the case of UDP, the eBPF offloader starts forwarding packets on its own as soon as and as long as a conntrack exists. The user-space program meanwhile updates the timeouts for offloaded connections.

And there is a charm when using nf_conntrack: Such a connection tracking entry directly has NAT information available, so you don’t have to retrieve them by parsing OpenWrt firewall rules. Furthermore, this means that the forwarding part of the eBPF offloader can run independently of the Linux operating system used. It is only dependent on an OS that runs the Netfilter framework, including nf_conntrack.

Packet Forwarding

The following simplified activity diagram illustrates how incoming packets are forwarded by the current implementation of the offloader:

Figure 1: eBPF Packet Forwarding

Here is a step-by-step explanation of what is happening:

  1. The eBPF program receives a data packet from the NIC for a not-yet-seen flow. It creates the packet tuple key and uses it to check whether an entry for that flow already exists inside the eBPF hash map. Since it hasn’t seen the flow yet, there is no entry, so the eBPF program creates a new empty entry inside that map to signal the user-space program. Meanwhile, it passes all the following packets of that flow to the network stack until the user-space program responds.
  2. When the user-space program wakes up, it retrieves the new flow entry from the map and checks through libnetfilter_conntrack whether a conntrack entry for the flow exists. If not, or the TCP state isn’t established, it doesn’t respond to the eBPF program (yet), so packets continue passing to the network stack. If there is an (established) conntrack entry, it also looks up inside that entry if NAT needs to be applied and, if so, calculates the checksum difference. Finally, it updates the flow entry accordingly to signal the eBPF program that it can take over now.
  3. When the eBPF program receives a new data packet for that flow again, it reads from the flow entry that it can forward the packet now, so it does possible NAT and checksum adjustments and redirects the packet to the target network interface. When there is a TCP FIN or RST or a conntrack timeout occurs, the eBPF program doesn’t forward the packet anymore and passes it to the network stack again.

Where to attach? Where to send?

There are two things I didn’t mention yet about the implementation:

  1. On which network interfaces should I attach my eBPF program?
  2. What is the next hop for the packet, i.e., to which output interface and neighbor to send it?

I implemented the latter within the user-space program using the Linux routing socket RTNETLINK. When I started to implement this, I performed the following three steps to determine the next hop:

  1. Send an RTM_GETROUTE message containing the packet tuple to determine the route type and output interface. I only offload unicast flows.
  2. Send an RTM_GETLINK message containing the output interface to determine the source MAC address.
  3. Send an RTM_GETNEIGH message containing the output interface and the destination IP to determine the destination MAC address.

Finally, the user-space program stores the output interface, source, and destination MAC address inside the flow entry. The eBPF program then rewrites the MAC header and redirects the packet to the output interface. But I wasn’t satisfied with that approach yet; I will explain the reason based on the following picture:

Figure 2: Example network interfaces on an OpenWrt device

The picture shows the network interfaces of my AVM FritzBox 7530 running OpenWrt. As you can see, all four LAN ports of my private network and my WiFi are bridged (which is typical, I think, and generally default for an OpenWrt installation). My dsl0 WAN port has a Point-to-Point Protocol over Ethernet (PPPoE) interface on top to establish a VDSL connection to my ISP, which additionally requires tagged VLAN packets (dsl0.7).

When no offloading is happening and, for example, my Notebook connected to phy1-ap0 would send traffic to the internet, the packets would travel through all shown interfaces except the LAN ports. (Figure 3). Regarding the eBPF offloader, the simple way would be to attach the eBPF program to the br-lan and pppoe-wan interfaces because I wouldn’t have to parse any additional L2 headers. The same goes when making routing decision(s) since you won’t have to query more interface information or push L2 headers. But the eBPF fast path would be minimal in that case. (Figure 4)

I thought this was not an acceptable solution for this project because the idea is to intercept an incoming packet as soon as possible. At the same time, the offloader should also send out packets at the lowest possible network interface. Therefore, the user-space program currently attaches the eBPF program to the lowest possible network interface and, while making the routing decision, also tries to resolve to the lowest possible network interface (Figure 5).

Figure 3, 4, and 5: Packet traversal for different offloading variants

The following flowchart shows how the user-space program currently does the next-hop determination:

Figure 6: Next Hop determination via Netlink

The eBPF program can currently parse the following headers of the respective layers. If it receives any packet containing a L2, L3, or L4 header not mentioned here, it passes the packet to the network stack.

  • L2: VLAN (currently only one) and PPPoE
  • L3: IPv4 and IPv6
  • L4: TCP and UDP

DSA: Going one step further down

As you might have seen in the flowchart of Figure 6, the user-space program also parses DSA interfaces, which stands for Distributed Switch Architecture. Routers typically contain an Ethernet Switch for their LAN ports, which has a management port connected to an Ethernet controller capable of receiving Ethernet frames from the switch. While Linux creates a network interface for that Ethernet controller, you can observe that the DSA driver also creates network interfaces (DSA ports) for the front panel ports.

Ideally, when the switch and management interface exchange packets, they tag the packets with a switch resp. DSA tag, which contains the front panel port ID. When the management interface receives a packet from the switch, it can determine from the tag from which front panel port the packet comes and pass it to the appropriate DSA port/interface. When the switch receives a packet from the management interface, it can figure out from the tag to which front panel port it must send the packet.

Let’s consider the following picture, which shows how OpenWrt on default settings uses DSA on a Banana Pi BPI-R64. The DSA switch resp. conduit is eth0 and lan1, lan2, lan3, lan4, and wan are the DSA ports resp. users.

Figure 7: Example network interfaces on an OpenWrt device using a DSA driver

Without offloading, a network packet sent from the private LAN to WAN would go through eth0, lan*, br-lan, wan, and eth0 again (Figure 8). When using the eBPF offloader without attaching to the DSA switch eth0, it is possible to avoid the bridge br-lan (Figure 9). But if you now attach the eBPF program to the DSA switch eth0, it can read and write the DSA tags of packets on itself, and the user-space program can then figure out which front panel received the package and to which one to send a packet. So when the eBPF program receives a packet on eth0, it can send it out of eth0 again without any intermediate interface (Figure 10).

Figure 8, 9, and 10: Packet traversal through a DSA switch for different offloading variants

Although this has the disadvantage that an eBPF program isn’t “generic” anymore because you need to compile it for the DSA driver used by the target device, it has the potential to further increase the forwarding performance.

Work to do in the upcoming weeks

There are a few problems I have encountered, resp., thought of:

  • I am unsure if nf_conntrack is sufficient for connection tracking because it isn’t possible to query conntrack entries based on the interface that received the packet. I think this can lead to collisions when different interfaces receive identical L3 and L4 flows.
  • Unfortunately, it is currently impossible to update the nf_conntrack packet and byte counters. This might be patchable in the Linux kernel, but my current workaround is to turn off the counters because I think it is better to have no counters than wrong counters.
  • I have shown that I retrieve PPPoE information in user space. The problem is that you cannot do that directly via Netlink since the interface attributes don’t provide PPPoE information. This is why I currently retrieve the interface’s link-local peer IPv6 address, convert it to a MAC address, and try to find that MAC inside the file “/proc/net/pppoe”, which is populated by the ppp daemon. I am anything but satisfied with that, but I haven’t found a better way yet.

Next to trying to solve those problems, the next milestone is to implement an eBPF package dropper into the offloader because, for now, it only forwards packets on its own. And then to finally make a performance evaluation of the implementation.

If you have questions, as always, feel free to ask them, and thank you for reading my Midterm update!

GSoC 2024: New release for Project Libremesh Pirania – Part II

Hello! This post is about my progress so far while working on the new release of Pirania package for the new version of LibreMesh 2024.1 which runs on top of OpenWrt 23.5.3.

During last month there was a lot of interaction with the community via mailing lists and Matrix chat room.

Goals of this project

Pirania is a captive portal designed for community networks. It allows community members to create vouchers (or tickets ) in order to manage access to the internet. When a device access the network for the first time it redirects for the captive portal. Then, it’s needed to insert the voucher previously create by a community operator.

This promotes the sustainability of the network, since there’s costs involved in maintaining one.

What needs to be done

In version 22.03 of OpenWrt the new framework for packet processing and firewall was change from iptables (firewall3) to nftables (firewall4). Since Pirania captive portal uses iptables rules to redirect and allow/deny traffic from clients, there is a need to also update the rules that are created by captive-portal script.

First try

Here i will discuss what worked and what’s not.

Since i have a compatible router with Lime old version 2020.4, a TP-Link Archer c50 v1, i wanted to flash it and see Pirania functionalities in practice. Downloaded a pre-compiled firmware and flashed. It worked and the next step was to install Pirania and start it.

I got some errors (in feeds, while running “opkg update”, more specifically) while installing Pirania which i reported in Matrix chat. Community members helped me and confirmed that this error was not present in recent versions.

Error:

Collected errors:
opkg_download: Failed to download http://downloads.openwrt.org/releases/19.07.10/packages/mipsel_24kc/libremesh/Packages.gz, wget returned 8.
opkg_download: Failed to download http://downloads.openwrt.org/releases/19.07.10/packages/mipsel_24kc/profiles/Packages.gz, wget returned 8.

If you run into error during update and install process of Pirania, do the following:

“it should be enough to delete the libremesh and profiles rows in /etc/opkg/distfeeds.conf as the correct info should be already present in /etc/opkg/limefeeds.conf”

After changing this files, i was able to install Pirania package. But, forgot to install ip6tables-mod-nat and ipset, then my router entered in a weird state. Moving on..

Second try

One of the last GSoC there was a project that aim on easing the virtualization of LibreMesh. Available here. But since the contributor has not changed the requested modifications, it is still open the issue.

I was able to virtualize both Lime 2020.1 and 2024.1 versions. I used the scripts available in lime-packages/tools in order to emulate with Qemu software. Unfortunately wasn’t able to provide internet access to the node itself.

Third try

I had a Rocket M5 MX standing idle and decided to flash with latest version of LibreMesh on it. The installation was easy and is working fine. Just had to add the following line to /etc/config/lime-node in order to get a valid IP from my local network since it only have one physical interface, in order to install ipset package.              

config lime network
config net portwan                                       
      option linux_name ‘eth0’                      
      list protocols ‘wan’   

Then, i was able to install the dependencies necessary to test my code.

Workflow

It’s really easy to test new software in Libremesh, since are usually scripts that need to be modified and can be run at run time. Just modify and upload the script to the working node and you are ready to go.

Code so far

I’m currently working on this branch, which link is below:

https://github.com/henmohr/lime-packages/blob/mohr-patch-nftables-1/packages/pirania/files/usr/bin/captive-portal

Next steps

The next step is to upload this script to a running node and see what happens.

There is a need to add more comments on the code and also with nftables is possible to enable remote logging of each rule that is executed, so will help a lot on debugging this script.

Also, i managed to setup a working node using VirtualBox. Maybe an alternative would be to create a VM with some Linux distribution and then connect it to the LibreMesh node, easing the process of testing.

GSoC 2024: Visualise community data, update 1

I’ve set up a lot of services since planning out the data pipeline in June; the new infrastructure diagram looks like this:

So, I wrote a little Rust utility called json_adder which takes the JSON files and reads the data into a MongoDB collection. We also have a GraphQL server, running through Node.js, which can handle some (very) simple queries and return data. Many of these services are individually easy to set up, the tricky part is making sure everything works together correctly. This is what your favourite technical consultancies call “fullstack development”.

Data loading

The first change from the initial plan was to use MongoDB instead of MySQL as a database. MongoDB has a built-in feature for handling time-series data, which is what we’re working with. It integrates well enough with GraphQL, and since one of my mentors (Andi Bräu) works with MongoDB on a daily basis, there’s a lot of experience to draw upon.

Here is the Rust code for the utility which adds the data to the database, and here’s what it does:

  • Read over each JSON file in the directory in a loop.
  • Extract the last-modified time from each community in the file and do some conversion to BSON.
  • Read the data into a struct, and add each object to a vector.
  • Insert the vector into the MongoDB collection.
  • GOTO next file.

When setting up the connection to MongoDB, the setup_db module will also check to create the collection if it doesn’t exist. This extra bit of logic is ready for when this has to be run in an automated process, which might involve setting up the database and filling it with data in one go.

I don’t have a binary to link here, and will write build instructions in future.1 On each commit which changes Rust code, Github Actions runs cargo check on the repository. When it comes to deploying this in reality, we can change that workflow to provide release builds.

At the moment it is a one-way process. You point json_adder at the data directory and it adds everything to the database. For future work, I’ll need a way to keep the database in sync, only add the files which have been modified, run it on a schedule. For now, it works fine.

GraphQL

Here is the GraphQL server code. Fetch the dependencies, in this order:

npm install express express-graphql graphql mongodb

The ordering of these is important.

Then, run npm start, and open http://localhost:4000/api in your browser, you should see the GraphQL IDE.

At the moment, the GraphQL can handle a query with an argument for metadata. I’ll build out the schema, and the resolver, to handle more arguments / more complicated queries. A lot of the documentation for GraphQL assumes a fairly simple schema, built for handling calls to a web app, which is slightly different from our case. The JSON Schema for the Freifunk API is not only quite long, it is also versioned, and newer versions are not necessarily backwards-compatible. I am going to sidestep this complexity by writing a GraphQL schema which only includes fields available in all versions. My first working example to try is a query which counts the number of nodes per community.

You’ll notice that the last step in in the pipeline is yet to be completed, I don’t have a visualisation to show. But, now that we can reliably pull data all the way through the pipeline, we’re almost ready to make some graphs.

  1. In the meantime, you only need to run cargo build in the json_adder directory. ↩︎

GSoC 2024: LibreMesh Cable Purpose Autodetection Update

Hello everyone! This blog post is meant as a mid-point update on the Cable Purpose Autodetection project.

This first part of the project was the occasion for a lot of discussions on how the project will take place to occur, and discuss how it will materialize for a LibreMesh user. These discussions mostly happened in the LibreMesh mailing list, and in the public matrix channel.

What is the goal of the project again?

In some scenarios, routers running LibreMesh benefit greatly from having specific configurations applied, which are not yet natively integrated — As of now, an expert needs to step in and setup these configurations manually, which is not always an option for communities. For this project, I am investigating scenarios that would benefit from automatically applied configurations.

What was discussed

There was a lot of discussions in the LibreMesh mailing list and in private messages, talking about the implementation and the scope of the project

Here is a sample of configurations where it would be useful to have such a system:

  • A LibreMesh router is connected to the internet and is serving internet access to its clients.
  • A LibreMesh router is connected using Ethernet to another LibreMesh router in the same network, to extend coverage.
  • A LibreMesh router is connected to another router not running LibreMesh, but working as an access point (in a point-to-point link) for client connecting to that router, and transferring data to the LibreMesh router.

Multiple solutions where discussed in the mailing list, for example, it would be useful to detect if a neighbor router is running LibreMesh or another firmware. A solution consisting of LibreMesh having its own ipv6 multicast has been discussed. This would allow a LibreMesh-running router to be more aware of what is around itself, and it could also detect if another LibreMesh router is part of the same network or a different one.

Implementation Solutions

One main point of discussion was “how to apply the configurations”. There was a lot of back-and-forth on that subject. The final software would take the shape of multiple scripts, each dedicated to tackle a single configuration setting.

  • The configuration scripts could be ran automatically at runtime. For example, a script could run every time an interface goes up or down. This means that if a link between two routers is broken, and somehow goes on and offline regularly, the configuration setting would be applied repeatedly, and in the process, could break some other part of the configuration. In that situation, potential problems would be very hard to troubleshoot if the user is not warned that this auto-configuration is occurring.
  • The configuration scripts could be ran when the user runs “lime-config”, which is an existing configuration tool for LibreMesh. This would avoid the problem of a script running repeatedly, because the user would stay in control, but it kind of ruins the whole “running configurations automatically” aspect of the project.
  • Another solution would be to implement buttons & switches in the Lime-app (that currently allows you to manage a node and its network), and add a new page or field where a user could choose if a specific configuration should be automatically applied, apply it manually or not at all. The problem is that the name of these settings might be obscure to less tech-savvy users.

So, we have different possibilities, each with their advantages and drawbacks. The solution will be to have some auto-configuration applied in a way, and other auto-configurations applied in another way.

Virtualization and problems encountered

One of my biggest roadblock in starting this project was running virtualization software. A previous Google Summer of Code participant made modifications to the LibreMesh code to allow virtualization using Qemu or Ansible (available here), but this code wasn’t pulled into the LibreMesh repository because it is not fully ready yet. This implementations contains instructions and modifications that allows LibreMesh to virtualize with Qemu and Ansible, and both of those software have different working and breaking points, and I had to experiment a lot to learn how to solve or work around them.
Using Qemu, it is possible to run up to 100 routers that act as nodes in a network. You can access each node using ssh to ping, create interfaces or check connection from one to the other.
Using Ansible, you can use the qemu_cloud_start.yml file to pre-configure and fine-tune the layout of your network before starting the virtualization in batches. (This makes it easier to specify which node should be an edge node, connected to the network, etc…). This .yml file is working as a blueprint for Ansible, and it will build your interfaces and network of nodes from the instructions contained. You can then run the nodes as a single batch, and connect to each of them using ssh.

However, virtualization is limiting when trying to understand a system. Simulating the cables connecting each node of a network is harder to visualize, in lieu of plugging everything by hand, and you often encounter problems that would not happen if you were working on actual hardware.
For example, connecting the virtual machine to the Internet is challenging (and not working 100% of the time), because you need libraries that are not checked from starting the virtual machine software, and the error messages you get from the machine are not clearly related to what the actual problem is.

Alternative: Working on actual hardware

I am currently waiting to receive a few routers that a community member (Aparcar) sent to me. I plan on using these routers to setup a LibreMesh network, and implement the changes I made in a more physical way than my current virtualization setups.
This setup will allow me to bypass some of the problems I am regularly encountering specifically because of the virtualization, at the cost of a slower iteration cycle. I’m looking forward to setting up and tinkering with actual hardware!

Next steps

  • The biggest next step is implementing code that can actually detect the specified configurations. The foundation for this is done, but more work is needed to make code that can fit the use cases.
  • From the mailing list, a lot of different solutions were discussed. We seemingly have more solutions than problems, so the next steps are implementing some of those solutions, seeing how they fit together with the current LibreMesh code, and choosing which are the best candidates to solve the auto-configuration problems.
  • Another big next step is transferring my changes to an actual router running LibreMesh code, to be able to test and code in a real-world setting

Bonus learning material

Q&A video

A big help in this project was a Questions & Answers session we organized with seasoned LibreMesh developer. We recorded the session, thinking that it would be good to keep track of some ‘newbie questions’ and their answers, for future onboarding. I uploaded the video to guifi-net’s peertube, using LibreMesh’s account. Thanks again to everyone who participated!
The video is available here

Vocabulary list of technologies

Another help in my learning process was a vocabulary list of the technologies I learned of, and how they were used within LibreMesh. This list gets extended every time I lean about a new part of a technology.
There was a lot more abbreviations I had to learn than I anticipated, and I’m still encountering more new ones every time!

Closing words

And that’s about it for a mid-project update! Thank you for reading, and don’t hesitate to send me a message if you’d like to know more.

GSoC 2024 Qaul: Qaul BLE module for Linux: Part-II

This post is a continuation of my last one, which discussed the theory behind Qaul BLE support for Linux where I explained the theoretical concepts and jargon and high-level system design for Bluetooth low energy flow using GATT. Please refer to that blog before proceeding.

In this blog, we will explore the implementation and low-level design of the BLE support in greater depth.

Introduction

We will begin with creating a GATT server serving two Bluetooth services “main_service” and “msg_service” with their respective UUIDs and characteristics for our qaul device. Then we will use the adapter to advertise the “main_service_uuid” for another qaul device to connect to the GATT server. Afterward, we will create a GATT client by starting a scanning service to discover nearby advertisements. The last thing to do is to set a few functionalities and callbacks like read and write characteristic data methods and request_read and request_write callbacks for characteristics to set a basic GATT prototype with message transfer support.

GATT Server

If you remember the GATT data hierarchy diagram from the last blog shows the GATT server is made up of services and each service is made up of characteristics and each characteristic of descriptors. Here, we will dive into details of how to implement it using BlueR.

GATT Service: GATT services are a set of similar attributes in one common section of the GATT server. For Qaul GATT Server implementation, we have two services – mainService, and msgService and their UUIDs (99xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx23). Below is a struct for Service where you specify uuid, characteristics and whether the service is primary or secondary.

GATT Characteristics: Characteristics are the containers for user data. For the Qaul Gatt Server, we have one characteristic for each service ➖ mainChar, msgChar, and their UUIDs with respective configurations. You can specify the uuid, descriptors(if any), read permission, and callbacks under the read attribute, and write permissions and conditions under the write attribute.

Now, you can create a new session to get an instance of your Bluetooth adapter and start the service GATT application using the above configurations. Below is a sample example of a GATT server with read characteristics functionality enabled.

Advertisements and Device discovery

We will use the above instance of the adapter to start advertising the main_service_uuid to be discovered by other devices. (Refer to sample code).

We can start scanning for nearby devices advertising the same UUID for the presence of qaul. For this, BlueR gives us adapter.set_discovery_filter(get_filter()).await; a feature for filter-based UUID scanning. and stream the discovered device adapter.discover_devices().await.

Till now we have created a Ble GATT server and client, the next thing we will implement is confirming the discovered devices for a qaul node, maintaining a list of discovered devices to save the throughput of scanning devices again and again, and keeping track of devices out of range.

For Qaul ble architecture, we have stored the qaul_id in the main_service’s characteristics, Now, whenever any device discovers this node, it will connect to it, check for the presence of both services along with their characteristics, and mark that qaul_id as discoverable. If the conditions are satisfied, it adds the node and its properties to a lookup map. To keep track of the device’s availability after discovery, Qaul spawns a new task out_of_range_checker to ping recently discovered devices at regular intervals to confirm their presence.

Any message to be sent can be written into msg_service’s characteristics using CharacteristicWriteMethod::Io and receive any write request using CharacteristicControlEvent::Write(write). The message read and write process is a bit more complex than function calls. We will discuss it in more depth in the next blog.

So, in the current blog, we are able to create a GATT server and a GATT client, advertise our UUIDs, and scan for nearby BLE devices. We also learned to identify the correct device and maintain the lookup table for better connectivity and more stability.

Concluding Thoughts

The first half of the GSoC coding period had more code implementation for ble connectivity. The major part was into integrating the prototype to work with async rust on multiple threads. The next half will be implementing loss-less transmission of data and experimenting with bluer for better stability. If you want to know more about my project, please feel free to reach out. Thanks for reading!