GSoC 2017 – Add MPTCP support in LEDE/OpenWRT trunk

Brief intro

My name is Ferenc Fejes. I’m MSc CE student at the University of Debrecen, infocommunication networks specialization. I glad to introduce my project which is Add MPTCP support in LEDE/OpenWRT. The main idea is from my mentor, Benjamin Henrion. If we have a Wi-Fi mesh network, there are probably more than one network path between the nodes. The point is, we can gain additional throughput if we use those path simultaneously.

The goal

Currently no MPTCP support in GNU/Linux nor Windows systems. So the key is, the user using regular TCP/UDP at his system until the first hop, which is a LEDE box with MPTCP support. At this point, this router operating as an intercepting proxy: read the incoming traffic from the host, and distribute that on multiple interface, using MPTCP. At receiver side, we can do the same but in reverse order: get the data from the MPTCP connection and send is to the sink host in regular TCP (or UDP). With this operation, the whole MPTCP transmission remains hidden for the end hosts, but they will enjoy the gain from it – because MPTCP has a good capability to aggregate the bandwitdh of the different paths. Of course, if the bottleneck between the end host and the first hop, the user did not get any gain from this operation.

The plan

The key steps for reach the goal of the project:
1. Set up virtual/real machines with MPTCP support. This is not a problem since kernel patches for MPTCP support available.
2. Make a simple multipath topology: end hosts with no MPTCP support, two machine with MPTCP support, functioning as a router and has two different network path between them
3. Search and configure an efficient and transparent proxy software on them, which can suitable for intercept TCP traffic and work as a tunnel for other protocols (like UDP, SCTP, etc.)
4. Reproduce the working configuration on LEDE supported devices, with two Wi-Fi bridge links between the LEDE boxes.
5. Verify the aggregation with iperf3 measurements.

GSoC 2017 Project – OpenWifi: LEDE/OpenWRT configuration Mangement

GSoC 2017 – OpenWifi

Hi, my name is Johannes Wegener and I’ll be working on OpenWifi this Google Summer of Code. I’m 27 years old and study computer engineering at TU Berlin. In this blog post I’m going to explain to you what OpenWifi is and what should be done during this summer of code.

What is OpenWifi?

OpenWifi is a OpenWRT/LEDE configuration management system. It is intended to manage a bigger or smaller amount of OpenWRT/LEDE devices.

If a new access point joins a network it is be able to auto detect the management Server and register to it. After the node has been registered its configuration (static configuration like what you’ll typically find in /etc/config/ on OpenWRT/LEDE devices) will be downloaded by the server and stored in a database.

It is now possible to query and modify aspects of the configuration. It also possible to change values depending on other values. This means configuration changes can be applied to a lot of different routers. There is an older templating system which shall be replaced by a new graph-oriented one. It also manages SSH-Keys, has a rudimentary file upload functionality (for uploading new images for example) and extensible API. It is possible for example for a node to ask for an image it should install. (Currently this is not very dynamic – but it is intended that an image could be selected on various parameters)

The Server also regularly pulls the health-state of the node and displays it in an overview. Furthermore the server acts as a luci2 proxy.

It is also extensible via PlugIns and is able to serve as an entry point to other services like icinga or location services. (Both already have a proof of concept PlugIn available)

How does it work? aka what makes it tick?

The main software is written in python and uses the pyramid framework and sqlalchemy as ORM. It uses pyramid-rpc for json-rpc requests (which is used for nodes to register for example) and cornice for a REST-style API (which is intended for the user of the system). The Core-System can be found in this repository.

Most of the tasks operating on nodes are done by a jobserver that uses celery – so that they don’t block the main thread.

All Webviews have been moved to a different repository and are realised in fact as a PlugIn. (If you just want to manage your nodes on CLI/with scripts this will be possible in the near future!)

The nodes use a notification script written as a shell script. It uses a fixed DNS entry (openwifi), a configuration file (/etc/config/openwifi) or mdns (using umdns or avahi) to detect a server and register to it. The packages can be found in the openwifi-feed repository. There is also a boot-flasher which is intended for flashing an alix2-style board from ramfs – since it is also possible to execute commands on the node I’ll integrate an update solution that uses sysupgrade.

The communication between the server and the node is realized currently via rpcd. But one goal during this Google Summer of Code is to abstract that and realize it via a rpcd-communication-PlugIn – this would make it possible to also a NetJSON-communication-PlugIn for example.

The PlugIns are realized with python entry points. There are some special named entry points which will extend the main application. You could have a look at the example plugin to get an idea how it works.

How to try it?

You can easily try the software with docker. Navigate to the Docker directory – there three files that will assist you with the setup. In conf.sh you setup if you want to use LDAP for authentication, avahi for mdns announcement and dnsmasq as dhcp server in the docker container.

With build_image.sh you build the docker image. It tries to detect if you need to use sudo for docker or not. Last but not least you need run_image.sh to start the image.

You can easily add PlugIns to the docker image if you just check them out or copy them inside the Plugins directory. To start off I would recommend to add the Web-Views Plugin and use avahi. Now you can navigate your browser to http://localhost:6543 to see the webviews (you need to restart the container after Plugin install – use docker stop OpenWifi and docker start OpenWifi).

What is needs to be done?

In the next section I’ll explain what should be done during Google Summer of Code. I need to prioritize these things because I’m not sure if it is possible to do everything. I think most important things are Testing, Authentication and the new graph-based database model.

Testing-Infrastructure

Currently just very few things have tests. But because it is possible to start a LEDE container in docker (there are some scripts helping to create a LEDE container in the Docker/LEDEImage directory) a lot of things are possible to test with docker orchestration. I want to provide a small LEDE image with the code to test it.

Before adding big new things I like to transfer the development to true test-driven development and having test for at least 90% of the core component code. And after that continuing development by writing tests first.

For that I would also have a look at coverage.py.

Security and authentication

There is a simplistic authorization for OpenWifi with a LDAP backend. I would like to add authorization based on user/password (without LDAP), API-Key and client side certificate (for the nodes to authenticate theirself against the API).

I would like to have the authorization as granular as possible. Authorization should check whether the authorized identity is able to operate on this node and what kind of action is allowed.

Security is provided by TLS. Last week I added the last bits to also use https for client communication – but right now it needs to be setup up by the user. I would like to add the appropriate hooks to the notification shell script.

Expand new graph-based DB-Model

The new DB-Configuration model is based on a graph – where configurations can be connected by links (the link can contain addition information – like what kind of link it is – currently the addition data is the option name (for example if you have a wifi-iface configuration the “device” links it to a wifi-device)).

The model is implemented and there is a query API to get and set options. But for communication purposes the configuration is stored as uci-json-config. There is code to convert between these two (but the one from uci to graph-model needs quite some more intelligence to detect everything right). There are also some hooks that update each part. But the update process is not consistent – the graph-based config gets a new ID for example. It would be nice to have a consistent conversion.

Furthermore I like nodes to share parts of a configuration. This should be easy to implement with some small DB changes. For this consistency is also very important. By default UCI generates unique strings for every configuration – this could be used for consistency.

The DB-Model also currently lacks creating new configurations and reordering configurations.

API

The API to modify nodes should get expanded and everything should get documented. (see above) This should be done in code and a document generated by sphinx.

I also like to implement a CLI program to use the API.

Versioning

The uci-json parser supports diffs between configurations. I would like to save a diff for each change which is applied – to have roll back configurations or delete a specific change.

There is a simplistic revisioning implemented. But the diff should be updated to a class of its own with json import and export and operations to apply a diff to a UCI configuration. (upgrade and downgrade)

Modularity and expandability

OpenWifi is right now is already quite modular – but I also want to modularize configuration parsing and communication. It would also be nice to have some interoperability with OpenWISP.

I use alembic for tracking database changes it would be nice to find a good way to use alembic for database changes needed by PlugIns as well.

Scheduled Updates

I want to schedule config updates – for example if you have a mesh you probably want to update the deepest nested nodes first and than the ones above etc. – this should be easy with celery (as long as the mesh is known and somehow represented in the database).

DynaPoint Final

Hi everyone,

this is the final blog post about DynaPoint. Short recap: I created a daemon which regularily monitors the Internet connectivity and depending on that activates and deactivates the proper access points. That way the handling with APs would become easier as you already could tell the status by the AP’s SSID. I also created a LuCI component for it to make the configuration more easy.

In the past weeks I was able to add some new features, fix bugs and complete the LuCI component. Especially the latter was really interesting and gave me some knowledge about how LuCI works.

In the last post I mentioned that it’s better to verify Internet connectivity by using wget instead of just pinging an IP address.
Consequently I switched from Pingcheck to wget. I also added an option to use curl instead of wget. With curl you also get the option to choose the interface for the connection.
When you provide internet via VPN-interface you can explicitly check the connection of that interface now. The reason why I don’t use curl as default is because of curl’s rather large size. For some routers with only 4 MB of storage it might be too much.

I also added an “offline threshold”, which will delay the switch to offline mode. So for example when you set the interval to 60 seconds and offline_threshold to 5, the switch to offline mode will be made after 5 cycles of checking (=300 seconds).

So how does an example configuration look like?

To use dynapoint just add dynapoint_rule ‘internet’ and dynapoint_rule ‘!internet’ in the desired sections in /etc/config/wireless:

config wifi-iface
    option device ‘radio0’
    option network ‘lan2’
    option mode ‘ap’
    option encryption ‘none’
    option ssid ‘freifunk’
    option dynapoint_rule ‘internet’

config wifi-iface
    option device ‘radio0’
    option network ‘lan2’
    option mode ‘ap’
    option encryption ‘none’
    option ssid ‘freifunk-maintenance’
    option dynapoint_rule ‘!internet’

The configuration of dynapoint takes place in /etc/config/dynapoint:

config rule ‘internet’
    list hosts ‘http://www.example.com’
    list hosts ‘http://www.google.com’
    option interval ’60’
    option timeout ‘5’
    option offline_threshold ‘3’
    option add_hostname_to_ssid ‘0’
    option use_curl ‘1’
    option curl_interface ‘eth0’

All of that can also be configured via LuCI:

If you want to try out DynaPoint for yourself please visit https://github.com/thuehn/dynapoint for more information.

Future work

Currently there is only support for one AP per state. In the next weeks I want to add support for multiple APs per state.
Also I want to add support for more rules. At this time there is only support for one rule “internet”. I want to make this more generic and provide support for custom rules.

Acknowledgements

I want to thank my mentor Thomas Hühn for his excellent mentoring and great ideas during this project. 
Also of course thanks to Freifunk for letting me realize this project and thanks to Google for organizing GSoC.

GSoC: The ECE configuration system – summary

The Google Summer of Code is almost over, so in this blog post I’ll give a overview over the targets I’ve met (and those I haven’t).

Code repositories

  • https://gitlab.com/neoraider/ece/commits/gsoc2016 (daemon, client libraries, CLI client)
  • https://gitlab.com/neoraider/pkg-ece/commits/gsoc2016 (OpenWrt/LEDE package feed)
  • https://gitlab.com/neoraider/uci-ece/compare/gsoc2016-upstream…gsoc2016 (UCI ECE backend)

All code in the first two repositories has been developed by me during the GSoC. The third link shows the work I’ve done to integrate a ECE backend into libuci.

What is working

As described in earlier posts, my GSoC project was a configuration storage system for OpenWrt/LEDE, trying to solve various issues of the UCI config system. The principal points of this new system are

    • ubus-based config daemon maintaining a central storage database file
    • JSON-based configuration data model
    • Validation based on simplified JSON-Schema

The Wiki at https://gitlab.com/neoraider/ece/wikis/home gives a good overview of the design and the usage of ECE and describes many features in detail. The pkg-ece package feed can be used to build and install the different components of ECE on OpenWrt/LEDE easily.

If you’ve worked with OpenWrt/LEDE, you probably know the UCI config system. A UCI config file looks like this:

config system
        option hostname 'lede'
        option timezone 'UTC'

This format is very simple: Each file (called “package”) has a number of sections (named or unnamed) of different types (this example from the “system” package has a single unnamed section of type “system”). These section contain options with single values or lists of values.

Unnamed sections are usually accessed using indices, for example a command to set the hostname would look like this:

uci set system.@system[0].hostname='betterhostname'

With the simplicity of UCI, there come various issues and missing features; these are only a few of them:

      • The fixed data model (package/section/option) makes some kinds of configuration very awkward: In the example above, the index 0 must be given for the system section, but having a second section of this kind would not make sense. In other cases, deeper configuration trees must be flattened to be stored in UCI, making the configuration harder to understand
      • All values in UCI are strings, which often causes inconsistencies (booleans are usually stored as ‘0’/’1′, but several other pairs like ‘false’/’true’ and ‘off’/’on’ are supported as well; different users of UCI sometimes parse numbers differently)
      • UCI doesn’t have built-in validation. Frontends like LuCI usually validate the entered data, but as soon as the CLI client is used, no validation is done.
      • UCI always stored the whole configuration file and not only changes from the defaults, making the storage inefficient on overlay-based filesystem setups as they are common on OpenWrt/LEDE
      • In some situations, upgrades to default values should also affect the effective values; but only if the user didn’t change the values themselves. With UCI, this is not possible, as it doesn’t store the information if a value was changed by a user.
      • UCI allows comments in config files, but they are lost as soon as libuci or the CLI tool is used to modify it

The configuration given above could be represented in ECE as this JSON document:

{
  "system": {
    "hostname": "lede",
    "timezone": "UCI"
  }
}

Note that this is only the external representation of the configuration; internally, it is stored in a more efficient binary format.

JSON gives us a lot of features for free: arbitrary configuration trees with proper data types. Existing standards and standard drafts like JSON Pointer and JSON Schema can be used to reference and validate configuration (the JSON Schema specification is simplified for ECE a bit though to allow more efficient validation on embedded systems).

The command for changing the hostname would look like this in ECE:

ece set /system/hostname '"betterhostname"'

The quoting is currently necessary to make the string a valid JSON document; this may change in a future version.

The whole configuration is saved in a single JSON document, but the specific format is not defined by a single schema; instead, each package can provide a schema, and the configuration tree is validated against a merged schema definition.

The schemas also provide default values for the configuration. Adding documentation for the configuration options to the schemas is planned as a future addition and might be used to support the user in configuration utilities and automatically generate web-based or other interfaces.

This gives only a small example for the usage of ECE, the abovementioned ECE Wiki contains much more information about the usage of ECE and the ideas behind it.

In addition to the daemon and a simple CLI utility, I’ve developed libraries for C, Lua and Shell which allow to access the configuration. While there are still some features missing (some points for future work are given in https://gitlab.com/neoraider/ece/wikis/todo ), I think most of the missing pieces can be added in the near future.

The UCI/ECE bridge

When I proposed my project for the GSoC, I didn’t aim at making it a full replacement for the current UCI system, at least not in the near future. While the possibility to move some of UCI config files into the ECE config database had been my plan from the beginning, my ideas for backwards compatibility didn’t go further than a one-time import from UCI to ECE, and one-way generation of UCI config files from ECE.

After talking to a few LEDE developers and package maintainers, it became clear to me and my mentors that many people are interested in replacing UCI with a better system in the not-too-far future. But for ECE to become this replacement, a real two-way binding between UCI and ECE would be necessary to allow gradual migration, so configuration utilities like LuCI (and many other utilities somehow interacting with UCI) don’t need to be adjusted in a flag-day change.

An incomplete design draft for this UCI/ECE bridge has been outlined in https://gitlab.com/neoraider/ece/wikis/design/uci-bridge . The code found in the UCI ECE backend repository linked above implements a part of this bridge (it can load “static” and “named” bindings from ECE into UCI, and commit “static” bindings back to ECE) and has been implemented as an API- and ABI-compatible extension to libuci. The development of this bridge has taken a lot of time (much more time than I had originally scheduled for UCI compatibility features), as the data models of UCI and ECE are very different.

Future work

Of all points given in https://gitlab.com/neoraider/ece/wikis/todo , finalizing the database format is the most important, as any future change in the storage format will either break compatibility or involve some kind of conversion. When it is clear the format won’t be changed anymore, ECE should be added to the OpenWrt community package repository to make it easily accessible to all OpenWrt and LEDE users.

After that, other points given in the TODO should be dealt with, but none of those seem too pressing to prevent actually using ECE for some software (but some of the points given in the first section of the TODO page would need to be addressed to properly support software that requires more complex configuration).

Last, but no least, I’d like to express my gratitude to my mentors and all people in the OpenWrt, LEDE and Freifunk communities who have helped me develop ECE by giving guidance and lots of useful feedback, and to Google, who allowed me to focus on this project throughout this summer.

Google Summer of Code 2016: External netifd Device Handlers – Final Milestone

TL;DR

FINISHING THE PROJECT

The past weeks have felt very satisfying in terms of coding. With the main structure in place, all that I had to do was add features, test, iron out the wrinkles and prepare my code for submission to the maintainer. Adding features was fun because it felt like taking big steps forward every day with immediate feedback.
There were a few stubborn bugs, though. Despite taking the better part of a week to find, they were — luckily — easy to fix. They usually required little more than changing the order in which code executed which meant swapping a few lines of code.
Rebasing the netifd source code on to the current version went very fast and resulted in three separate commits; two small ones to prepare the existing code base for my additions and one pretty big one adding my ~2000 line .c-file.

As this is the blog post wrapping up my Google Summer of Code, I am going to summarize the entire project. This probably means that I will repeat some of the points I have already explained in my previous posts.

GOALS MET

I set out to implement a way to open up OpenWRT/LEDE’s network interface daemon (netifd) for new device classes.
Until now, netifd was only able to handle device classes for which a handler was hard-coded into the daemon. I added a way to generate the necessary device handler stub from JSON descriptions and interface them over ubus with another process that handles all device-related actions such as creation and configuration. Netifd is more open to experimentation and does not require maintenance from someone introducing a new device class.
Configuration of these devices still happens in the familiar /etc/config/network file.

Along with the ubus interface in netifd, I wrote ovsd, an external device handler for Open vSwitch. Together with ovsd, netifd can create and configure Open vSwitch bridges without any Open vSwitch-specific code added to it. It simply parses the configuration in /etc/config/network, sees that there is an interface on a device with type ‘Open vSwitch’, uses this name to look up the corresponding device handler from a list and calls the ‘create’ function that is part of the device handler interface.
The Open vSwitch device handler stub then relays the command along with the configuration blob to ovsd via ubus.

Ovsd processes the command asynchronously and answers using the ubus subscription mechanism. Netifd can then bring up the device as usual and attach protocol handlers and interfaces to it.

My work on netifd is not likely to get included upstream before GSoC ends which is why I have created a patch with my changes and put it in a repository here. The repo includes the LEDE source code as a git submodule and is easy to build.
The ovsd source code is hosted at GitHub along with instructions on how to install it.

DETAILS OF THE SUBMITTED WORK

This example file demonstrates all the available configuration options for Open vSwitch devices in a possible scenario featuring an Open vSwitch bridge with interface eth0 and a fake bridge on top of it:

# /etc/config/network
config interface ‘lan’
    option ifname ‘eth0’
    option type ‘Open vSwitch’
    option proto ‘static’
    option ipaddr ‘1.2.3.4’
    option gateway ‘1.2.3.1’
    option netmask ‘255.255.255.0’
    option ip6assign ’60’

    option ofcontrollers ‘tcp:1.2.3.2:9999’
    option controller_fail_mode ‘standalone’
    option ssl_cert ‘/root/cert.pem’
    option ssl_private_key ‘/root/key.pem’
    option ssl_ca_cert ‘/root/cacert_bootstrap.pem’
    option ssl_bootstrap ‘true’

config interface ‘guest’
    option type ‘Open vSwitch’
    option proto ‘static’
    option ipaddr ‘1.2.3.5’
    option netmask ‘255.255.255.0’

    option parent ‘ovs-lan’
    option vlan ‘2’
    option empty ‘true’

The lines set apart are Open vSwitch-specific options. They configure OpenFlow settings, control channel encryption and the nature of the device itself, which can either be a ‘real’ or a ‘fake’ bridge on a parent bridge and VLAN tagging enabled.

This is the JSON description from which the device handler stub is generated:

# /lib/netifd/ubusdev-config/ovsd.json
{
    “name” : “Open vSwitch”,
    “ubus_name” : “ovs”,
    “bridge” : “1”,
    “br-prefix” : “ovs”,
    “config” : [
        [“name”, 3],
        [“ifname”, 1],
        [“empty”, 7],
        [“parent”, 3],
        [“vlan”, 5],
        [“ofcontrollers”, 1],
        [“controller_fail_mode”, 3],
        [“ssl_private_key”, 3],
        [“ssl_cert”, 3],
        [“ssl_ca_cert”, 3],
        [“ssl_bootstrap”, 7]
    ],
    “info” : [
        [“ofcontrollers”, 1],
        [“fail_mode”, 3],
        [“ports”, 1],
        [“parent”, 3],
        [“vlan”, 5],
        [“ssl”, 2]
    ]
}

The first four fields tell netifd the type of the devices and the name of the external handler to connect to via ubus. “bridge” and “br-prefix” signal the bridge capabilities of the devices and give a short prefix to prepend to devices of type Open vSwitch creating devices named “ovs-lan” and “ovs-guest” for the interfaces “lan” and “guest”, respectively.
The “config” and “info” field detail the configuration parameters and their data types for device creation and for responses to the ‘dump_info’ method of the external device handler (see my earlier blog post).
I have created a manual on how to write a JSON description of an external device handler that goes into greater detail. You can find it here.

POSSIBLE LIMITATIONS AND OPEN WORK

Because of the nature of the device handler I implemented, I could only test scenarios that arise from using Open vSwitch. These are bridge-like devices by design and not every device class someone might want to integrate with netifd will behave like them. I had to anticipate much of the behavior for non-bridge-like devices trying to make the ubus interface device class agnostic.

Also, while I tested ovsd with all the complex setups I could think of using fake bridges and wireless interfaces, there are probably edge cases where my implementation is insufficient. Ovsd may have to become stateful at some time.

At the moment, the JSON files from which the device handler stubs are created are a bit “human-unfriendly”. Parsing the JSON objects fails unless they are written on a single line with proper EOF termination. Allowing pretty-printed JSON would greatly increase readability. What’s more, the parameters’ data types are cryptic enumeration values from libubox. Another translation layer turning numbers into readable types like “string” would also help users read and write description files.

IMPROVEMENTS TO MAKE

In the future, I would like to add a way for external device handlers to send detailed feedback about requests they receive back to netifd within the context of the request. This way authors of external device handlers could provide device-specific information such as specific error messages when a command fails.
All the building blocks for such a mechanism are there:
  – ubus replies that are logically connected to an ongoing request and
  – the possibility to tell netifd what messages to expect via an entry in the JSON description file.

This way, messages and their formats could be defined by the authors of external device handlers on a per-device-class — and even a per-command — base. Netifd would then be able to handle these messages and write them to the logs along with information about the context of the ongoing transaction.

MY EXPERIENCE WITH GSOC

All in all, GSoC was a blast. Although challenging and frustrating at times, I am happy to have done it. This was the first time I had to read other people’s code to this extent and at this level of — for lack of a better word — sophistication. It took me the better part of a year to really get a grip on the code base when I was working with it as part of the student project at TU Berlin but once I had a feeling for its structure, playing around with the features I added was fun. It feels rewarding to have worked on and contributed to “real world” free software.
From the organizational, side I am very happy with both Freifunk’s and Google’s way of managing the process. I always knew what was expected of me and was provided the necessary information in advance.

ACKNOWLEDGEMENTS

Now that my first GSoC is over, some thanks are due. First and foremost, I would like to thank my mentor and advisor Julius Schulz-Zander for introducing me to GSoC and for his counsel throughout the summer. Thanks to Felix Fietkau, original author of netifd and its maintainer, of whom I’ve learned a lot both indirectly by working with his code and directly when receiving feedback on my work. Finally, I want to thank Freifunk for letting me do this project and — of course — Google for organizing GSoC.

DynaPoint update

Hi everyone,

here are some updates regarding DynaPoint. The idea was to create a daemon, which regularily checks the Internet connection and changes the used access point depending on the result. That way the handling with APs would become easier as you already could tell the status by the AP’s SSID.

A daemon with basic functionality is already working. After installation, there is one configuration step necessary.
You have to choose in /etc/config/wireless which AP should be used, if Internet connectivity is available and which one if the connectivity is lost. You can do that by adding “dynapoint 1” or “dynapoint 0” to the respecive wifi-iface section.

You can also configure dynapoint via LuCI, although it’s not yet complete.
I was struggeling a bit with it, because the documentation of LuCI is a bit minimalistic…
Here is a screenshot of how it currently looks like:

Next steps

To verify Internet connectivity, it is probably better to make a small http download than just ping an IP address.
Using “wget –spider” should be suitable for that.

Also, I will see if I can get rid of the required configuration step in /etc/config/wireless in the next weeks and provide fully automatic configuration.

If you want to test dynapoint for yourself, just go to https://github.com/thuehn/dynapoint.

Google Summer of Code 2016: External netifd Device Handlers – Milestone 1

OVERVIEW OF THE LAST WEEKS
During the last 5 to 6 weeks I have implemented the possibility to include wireless interfaces in Open vSwitch bridges, rewritten a lot of the code for creating bridges with external device handlers and brought my development environment up to speed. I am now working with an up-to-date copy of the LEDE repository.
I have also implemented the possibility for users of external device handlers to define what information and statistics of their devices look like and to query the external device handler for that data through netifd.

CHANGES TO THE DEVELOPMENT ENVIRONMENT
So far, I have been using quilt to create and manage patches for my alterations of the netifd code.
One day they were completely broken. I still do not know what and how it happened but it was not the first time and this time recovery would have been way too tedious.
This is why I switched to a git-only setup. I now have a clone of the netifd repo on my development machine that is symlinked into the LEDE source tree using the nice ‘enable package source tree override’ option in the main Makefile. I used the oppportunity to update both the LEDE source tree and the netifd repository to the most recent versions.
Before, I was working on an OpenWRT Chaos Calmer tree, because of a bug causing the Open vSwitch package to segfault with more recent kernels.
Now, everything is up-to-date: LEDE, netifd and Open vSwitch.

MY PROGRESS IN DETAIL

More Dynamic Device Creation
In actual coding news, I have refined the callback mechanism for creating bridges with external device handlers and the way they are created and brought up.
Previously, a bridge and its ports were created and activated immediately when /etc/config/network was parsed. Now, the ubus call is postponed until the first port on the bridge is brought up.
Because of the added asynchronicity, I had to add a ‘timeout and retry’-mechanism to keep track of the state of the structures in netifd and the external device handler.

A few questions have come up regarding the device handler interface. As I have explained in my first blog post, I am working on Open vSwitch integration into LEDE writing an external device handler called ovsd. Obviously, this is very useful for testing as well.
I have come across the issue of wanting to disable an bridge without deleting it. This means bringing down the bridge and removing the L2 devices from it. The device handler interface that I mirror for my ubus methods doesn’t really have a method for this. The closest thing is ‘hotplug remove’, which using feels a bit like a dirty hack to me.
I have reached out no netifd’s maintainer about this issue. For the meantime, I stick to (ab)using the hotplug mechanism.

On the ovsd side I have added a pretty central feature: OpenFlow controllers. Obviously, someone who uses Open vSwitch bridges is likely to want to use its SDN capabilities.
Controllers can be configured directly in /etc/config/network with the UCI option ‘ofcontrollers’:

config interface ‘lan’
        option ifname ‘eth0’
        option type ‘Open vSwitch’
        option proto ‘static’
        option ipaddr ‘1.2.3.4’
        option gateway ‘1.2.3.1’
        option netmask ‘255.255.255.0’
        option ofcontrollers ‘tcp:1.2.3.4:5678’
        option controller_fail_mode ‘standalone’

The format in which the controllers are defined is exactly the one the ovs-vsctl command line tool expects.
The other new UCI option below ofcontrollers configures the bridge’s behavior in case the configured controller is unreachable. It is a direct mapping to the ovs-vsctl command ‘set-fail-mode’. The default behavior in case of controller absence is called ‘standalone’ which makes the Open vSwitch behave like a learning switch. ‘secure’ disables the adding of flows if no controller is present.

Function Coverage: Information and Statistics
Netifd device handlers have functions to dump information and statistics about devices in JSON format: ‘dump_info’ and ‘dump_stats’. Usually, these just collect data from the structures in netifd and the kernel but with my external device handlers, it is not as simple. I have to relay the query to an external device handler program and parse the response. Since the interface is generic, I cannot hard-code the fields and types in the response. This is why I relied once more on the JSON data type description mechanism that I have already used for dynamic creation of device configuration descriptions.
In addition to the mandatory ‘config’ description, users can now optionally provide ‘info’ and/or ‘stats’ fields. Just like the configuration descriptions they are stored in the stub device handler structs within netifd where they are available to serve as blueprints for how information and statistics coming from external device handlers have to be parsed.

For my Open vSwitch setup, it currently looks like this in /lib/netifd/ubusdev-config/ovsd.json:
{
    “name” : “Open vSwitch”,
    “ubus_name” : “ovs”,
    “bridge” : “1”,
    “br-prefix” : “ovs”, 
    “config” : [
        [“name”, 3],
        [“ifname”, 1],
        [“empty”, 7],
        [“parent”, 3],
        [“ofcontrollers”, 1],
        [“controller_fail_mode”, 3],
        [“vlan”, 6]
    ],
    “info” : [
        [“ofcontrollers”, 1],
        [“fail_mode”, 3],
        [“ports”, 1]
    ]
}

This is how it looks when I query the Open vSwitch bridge ‘ovs-lan’:

THE NEXT STEPS

During the weeks to come I want to look into some issues which occurred sometimes when I disabled and re-enabled a bridge: Some protocol-realated configuration went missing. This could mean that sometimes the configured IP address was gone. Something which could help me overcome the problem is also in need of some work: reloading/reconfiguring devices.
Along with this, I want to get started with the documentation to prepare for the publication of the source code.