Unit testing LibreMesh GSoC final report

I am very happy to have participated in this GSoC. I’ve learned many things and I’ve been able to implement stuff that I hope it is useful.

Things that have been achieved

  • Busted unit testing framework integration
  • Coverage report integration. Final coverage output

CI integration with Travis. The build was split in two stages: unit testing and package build:

travis-ci
  • I have writen tests for the following core parts: lime.config, lime.network, lime.wireless, lime.utils. Quality of the tests is diverse, some are just stubs so we can improve them in the future, but some are good.
  • Tests for packages: firstbootwizard has been improved in order to support unit testing and a first simple test is in place. To write more tests more changes to FBW are needed.
  • Integration tests: lime-config with device support
  • iwinfo fake library, with many helper functions to easily fake a device and station connected, etc.
  • Uci testing environment helpers
  • Device support: A simple device support was implemented. For the moment this needs the /etc/board.json of the device and the /etc/config/network and /etc/config/wireless that are generated by OpenWRT on the first boot. With this files a testing environment is created using uci and iwinfo so for the tests a device is emulated. Using this infrastructure a lime-config test was implemented. For the moment only LibreRouter-v1 device is supported but it is very easy to add more devices.

Here is a reference PR with all the work I did for this GSoC. In order to have this work merged I created many small PR in the LibreMesh repository: #562, #563, #564, #565, #566, #567 and #568. Some of the work is not yet in PR to LibreMesh to don’t overwhelm the reviewers.

Future work

  • Add more devices.
  • Discuss if writing an integration test that uses lime-defaults and lime-defaults-factory with a device and check that the result is what it is expected is helpful, and if it is, write this tests.
  • Provide a way to test packages that use ubus library.
  • Explore how to use this testing environment in other openwrt Lua packages outside of LibreMesh. Even C code should be easily tested with automatic Lua bindings.

Lessons learned

Unit testing framework

After working with Busted I think it has been an excellent choice choice as unit testing framework. It is very well documented, very powerful and at the same time is easy to use. I used it for writing very different tests and I never missed something. Mocks and stubs are good and asserts are powerful.

OpenWrt Unified Configuration Interface (UCI) library

At first my idea was to create a fake library because I thought that this could be easy and at the same time very handy for the tests. I implemented quickly a fake library but it did not behave the same as the uci library in many corner cases. I realized that behaving exactly the same will take a lot of work and if it does not behave exactly the same it will be very annoying because the tests will work differently than in production. And that is a very bad idea.
So I decided to try to use the real UCI library and create a clean environment for each test with helper functions. It was very easy to do it as UCI provides a way to change the config environment.

Docker image

A side effect of basing the testing Docker image in Alpine Linux is that it is ABI compatible with OpenWrt x86_64 packages because both use musl C library. This allow us to easily use some openwrt libraries like luci.ip, uci, etc directly from public OpenWrt packages. This keeps the testing maintenance effort low as we are not having to build this libraries by ourselves.

Lua is powerful

Coming from a Python background I thought I will miss many things but from a language perspective that was not the case!

Links to older posts

First post, second post, third post, fourth post.

Beside writing comments to this blog you can also write me to spiccinini _ altermundi _ net if you want to discuss anything.

Best!

Load-correlated distributed bandwidth analysis for LibreMesh networks – #4: Conclusions and further work

Here I describe everything I did for my Google Summer of Code project this year.

First of all, thanks to Freifunk and LibreMesh communities and developers for the opportunity!
The work I did is quite spread, from general documentation to bug fixing and actual coding, I’ll try to collect everything in a more-or-less ordered fashion.

Compiling the firmware: methods, fixes and documentation

At the beginning of my GSoC, I tested various methods for compiling the latest LibreMesh firmware.

OpenWrt buildroot

At first I tried using the LibreRouter organization fork of the OpenWrt source code repository. After updating a small thing here (merged PR) I decided to use directly OpenWrt repository on the openwrt-18.06 branch in order to have all the fixes which will enter the next OpenWrt release in the 18.06 family.

As explained in my second blog post I decided to compile all the LibreMesh packages but to not include them in the binary image, this allowed me to flash a safe image (plain OpenWrt) and to add the juicy bits using OPKG from a local-network packages repository. Looking back, maybe this was an overkill and including all the packages into the images would have been just fine.

The list of packages I selected and I suggest to use as default for the next LibreMesh release are:

check-date-http first-boot-wizard hotplug-initd-services lime-app lime-debug lime-hwd-ground-routing lime-hwd-openwrt-wan lime-proto-anygw lime-proto-babeld lime-proto-batadv lime-proto-wan lime-system shared-state shared-state-babeld_hosts shared-state-dnsmasq_hosts shared-state-bat_hosts shared-state-dnsmasq_leases shared-state-nodes_and_links lime-docs lime-docs-minimal libremap-agent

I documented the process here.

lime-sdk

lime-sdk was the recommended local compilation method for the last stable release LibreMesh 17.06. I fixed its master branch here (merged PR). But more problems persist, see my issues here and here. I didn’t try to fix them as the main developer decided to abandon the support to the latest stable release (see here) and for the next release it won’t be used anyway.

openwrt-metabuilder

I casually found Paul Spooren’s openwrt-metabuilder which has the potential to provide the same user experience as lime-sdk. I fixed a small thing in the examples here (merged PR) and created two new examples: one for compiling LibreMesh 17.06 and another for compiling the latest code, they can be found here (open PR). This system downloads and installs compiled packages, which for the latest LibreMesh code case are compiled by Travis continuous integration. Travis configuration was broken, I updated the configuration here (merged PR) and works again. The list of the packages being compiled was not complete so that some of the ones needed for the latest LibreMesh could not be installed, I added all of them to the to-be-compiled list here (open PR).

Documentation on compilation

What I concluded was used for updating the compilation instructions on the LibreMesh website, with plentiful of other updates and improvements, can be read here (open PR).

One thing that the documentation is still missing is how to use the network-profiles (introduced with LibreMesh 17.06 to be used with lime-sdk for having a network community wide customization) with the OpenWrt buildroot (while openwrt-metabuilder already supports it, simply indicating the network-profiles name as a package to install works). I started some discussion on the topic here.

Test network: supporting unsupported routers and unexpected bugs

Supporting more routers

LibreMesh default configuration creates three interfaces on each radio (two access points with different ESSID and one IEEE802.11s mesh). This works on a very limited set of routers, which are the officially LibreMesh-supported ones. I own many home routers from various ISP, which are perfectly supported by OpenWrt but not by LibreMesh and I wanted to expand LibreMesh support to these abundant and “free as in free beer” routers.

On LibreMesh, by default, the routing (BATMAN-adv and Babeld) happens on top of IEEE802.11s mesh interfaces. For using these routers I had to expand the configuration scope for AP and client interfaces and the result can be seen here (open PR).

Memory leak of YouHua WR1200JS on ethernet when using VLAN 802.1ad

While testing with the LibreMesh-supported routers I have, TP-Link WDR3600 and YouHua WR1200JS, I had some interesting trouble. The first router saw the routing peers also via ethernet cable connection while the second didn’t. Digging deep into the packets with tcpdump on various interfaces I realized that the YouHua WR1200JS leaks memory (I don’t know from which memory) into the packets’ content when using VLAN of type 802.1ad (the common VLAN 802.1q works just ok) breaking the packets and leaking information.

I reported this fact here and here and received no answer nor confirmation yet.

Data collection: lime-report and bandwidth-test

The objective of my GSoC included the development of reporting utilities and the smart scheduling of their execution.

Regarding the first part, I completed the development of lime-report (based on a draft by Paul Spooren) and developed from scratch bandwidth-test. The former can be seen here (open PR) and the latter here (open PR).

lime-report

lime-report is a very simple shell script outputting a set of debugging commands output and configuration files content. A few options allow the user to select the needed information type.

bandwidth-tool

bandwidth-test is a tool for estimating the maximum available download bandwidth from the internet. In order to work even on restricted connections, it just uses port 80 with HTTP connections. It has be designed for working also on a common Linux machine (requires lua, wget and pv), not only on OpenWrt.

By default, a few large files are downloaded during 20 seconds. After this timeout, the download gets interrupted and the speed estimated. The failed downloads gets ignored and more files gets downloaded until having 5 successful tests. At this point the outputted value is the median of the 5 results.

Tests scheduling at peak and night time

In order to have interesting information, the network status and performances have to be referenced to the network load. Active tests which risk to affect the users experience should be run during the night time, when the network is at rest, while passive tests can be safely done at the network usage peak time, when problems are more likely to show up. The tests results should be stored on the router for allowing the diagnosis of problems after an accident.

Each router determines the peak time based on three different commands giving an estimation of the network-wide connected clients. Once one day-time of load data is collected, each router starts scheduling the passive tests at the peak time, using the classical at command. The load time-profile is constantly updated considering both previous days and today’s loads.

The most heavy test to be run during the night time is the bandwidth test. In order to avoid cross-correlation between the tests, they have to be performed at different times. The synchronization is obtained using the shared-state routine and assuming that all the router’s clock are synchronized (we are performing bandwidth tests towards the internet, so it’s safe to assume that the clocks are synchronized, either via NTP or via check-date-http routine). The implemented strategy is: run the tests-scheduler routine at a randomized time, so that each router does it at a different time. Select the 6 hours in the day where the network load (number of clients) is minimum. Read the time the other routers announced they will run the tests at, this works via shared-state. Between these 6 hours chose the one which has less scheduled tests by other routers. Within this hour, group the other routers scheduled tests in 5 minutes groups and chose the less populated group. Randomize the execution time in this 5 minutes range.

The code is not yet tested enough to be considered ready, but can be seen at this commit. The actual PR will have a rewritten version of this, from another branch, but this link will be kept valid for GSoC reference.

More minor fixes and documentation

I reported here and proposed a fix here (open PR) for a problem noticed by an user. Some very minor errors I noticed and fixed are here (merged PR), here (merged PR) and here (open PR).

In this already mentioned pull request I also updated and expanded the lime-example file which is the most complete documentation on the LibreMesh configuration. Some more improvements to the website are here (merged PR), here (merged PR) and in this already mentioned pull request.

Further work

  • Complete the testing of tests-scheduler
  • Use LibLogNorm for normalizing the logs collected by lime-report and reducing their size
  • Make the tests results available to an external Prometheus monitor
  • Implement a strategy for saving the tests results on flash memory rather than on RAM (so that they are persistent over reboot): frequent writing has to be avoided for limiting the memory tearing, logs can be written on flash just when certain problems are detected (e.g. internet connection lost)
  • Implement a strategy for deleting old tests results when RAM or flash start getting full

Maaany hugs!
Ilario

Load-correlated distributed bandwidth analysis for LibreMesh networks – #3: Completed test network and broadened scope of the work

The planned test network has been built, employing both fully supported (I just documented them in the tested routers list here) and common home routers (officially unsupported by LibreMesh but supported by OpenWrt).

Employing non supported routers required an expansion of my previous work about making possible an AP-sta (point to multi point access point to clients) network architecture (instead of the default IEEE802.11s mesh). My previous solution relied on BMX6 which will not be included in the next release, in favor of Babeld, so the problem is open again. I provisionally managed to have Babeld on AP and client interfaces adding the following setting in /etc/config/lime on the access point:

config wifi 'radio0'
     list modes 'apname'
     option country 'ES'
     option channel_2ghz '11'
     option apname_ssid 'LibreMesh.org/%H'
     option apname_key 'someAPpassword'
     option apname_encryption 'psk2'
     option distance '100'

 config net 'wirelessap'
     option linux_name 'wlan0-apname'
     list protocols 'babeld:17'

and the following in the /etc/config/lime of the client (taking advantage of the client protocol I added some time ago here):

config wifi 'radio0'
     list modes 'client'
     option country 'ES'
     option channel_2ghz '11'
     option client_ssid 'LibreMesh.org/LiMe-eb7f64'
     option client_key 'someAPpassword'
     option client_encryption 'psk2'
     option distance '100'

 config net 'wirelessclient'
     option linux_name 'wlan0-sta'
     list protocols 'client'
     list protocols 'babeld:17'

For some reason this solution does not propagate the default route obtained from Babeld to the whole network, this does not directly affect my project, anyway I’ll surely manage to fix this in the upcoming days.
In case the usage of such perfectly-working trashware was a blocker, I will receive a few more supported routers in the following days and I will just use those.

Also due to the switch to Babeld, to obtain a complete graph of the network is not yet possible (Babeld being based on the distance vector principle, does not know the whole topography and we’ll have to aggregate it using the new shared-state LibreMesh feature).

During the building of the test network, the planned topography changed a bit resulting in this one (solid lines are cabled connections, directional dotted lines with arrows points from the client to the access point, non-directional dotted lines are proper IEEE802.11s mesh):

All the routers were flashed with OpenWrt 18.06-SNAPSHOT image, which is OpenWrt 18.06.4 with additional fixes appeared in the release branch here compiled locally using OpenWrt buildroot. LibreMesh packages were also compiled in the same process but not included in the compiled image, and installed later using opkg and serving the packages over the local network. This approach showed to be more convenient than expected, additionally, the fallback image is a plain OpenWrt, which decrease the risk of “brikking” the devices.

The complete list of the installed packages from the LibreMesh ones is:

check-date-http first-boot-wizard hotplug-initd-services lime-app lime-debug lime-hwd-ground-routing lime-hwd-openwrt-wan lime-proto-anygw lime-proto-babeld lime-proto-batadv lime-proto-wan lime-system shared-state shared-state-babeld_hosts shared-state-dnsmasq_hosts shared-state-bat_hosts shared-state-persist shared-state-dnsmasq_leases shared-state-pirania

Lately, I got also involved in the development of lime-log-review, which uses liblognorm to decrease the volume of the logs and can be used in my project for storing the key information from the voluminous logs when an incident is detected.

Unittesting LibreMesh GSoC mid-term update

A few weeks passed and I want to share the progress of the project and what are the next steps πŸ™‚

Unit testing tools and ecosystem

As one of the goals is that it must be easy for developers to write, modify and run the tests I created some simple tools to do this:

  • testing image -> Dockerfiles/Dockerfile.unittests
  • testing shell environment -> tools/dockertestshell
  • running the tests -> ./run_tests script

Testing image

In order to run the tests and have a reliable environment without the it’s working on my computer syndrome I created a simple and very small Docker image. This image is based on an image with Lua 5.1 and luarocks made by abaez. And then it just installs busted and luacov frameworks and bash.

FROM abaez/luarocks:lua5.1

WORKDIR /root

RUN luarocks install luacov; \
    luarocks install busted

# TODO: move into a development dockerfile
RUN apk add --no-cache bash bash-completion

Nixio library

It would be good to have nixio available inside the docker image because this library is widely used in libremesh and also it could be very handy to have it available for the testing also.

I did an effort to add it to the image but many problems arised. The luarocks version of nixio 0.3-1 it is not working mainly because some compilation issues with newer versions of gcc. So I tried to work on a rockspec without this problem but I could not finish it because other problems arised I think that related to the Alpine/musl distribution and libc/linux headers. I will try help the author of nixio to publish a new version to luarocks as this will benefit others too.

Testing shell environment

To provide an easy way to develop or test things within the docker image I created a tool that opens a bash shell inside the docker image that has some features that allows easy development:

  • /home/$USER is mounted inside the docker image so each change you do to the code from inside is maintained when you close the docker container
  • the same applies to /tmp
  • you have the same user outside and inside
  • network access is garanted
  • and some goodies like bashrc, some useful ENV variables, PS1 modification, etc.

To enter the shell environment run:

[san@page lime-packages]$ ./tools/dockertestshell 
(docker) [san@page lime-packages]$

You can see that the prompt is changed adding (docker) in the left part so you can easily remember that you are inside the docker container.

This environment is also used by run_tests script.

Running the tests: run_tests bash script

run_tests

This script is what you should be running each time you want to run the tests. As you can see in the image we currently have 19 tests and all are passing πŸ™‚

For the sake of showing you what to expect when a test fails I modified a test condition to be false and here is the output:

run_tests_fail

now 18 tests are good, and there is one failure. The assertion that is failing is on line 11 and the test is Fake uci tests test simple get and set. Also it is shown that the expected result is the number 2 but the actual result is the number 1.

As it is expected, run_tests returns with 0 when all tests pass and != 0 when there is at least one failure.

The script in detail

The idea behind this script is simple:

  • sets the search path of the tests for buster (the unittesting framework)
  • sets the lua library paths, prepending the fake library and adding the paths to the libremesh packages with packages/lime-system/files/usr/lib/lua/?.lua. This doesn’t work automaticaly for every package if the paths does not use the files/path/to/final/destination. So if you want to test some package without the files convention maybe it would be good to move the package to this convention. Also it does not work if the lua module we want to test does not finish with .lua, in this case the path must be explicitly added (I wrote about this in a previous blog post).
  • runs the tests using the dockertestshell

run_tests also passes the first argument as an argument to busted so you can do things like this:

[san@page lime-packages]$ ./run_tests "--list --verbose"
packages/lime-system/tests/test_lime_config.lua:11: LiMe Config tests test empty get
packages/lime-system/tests/test_lime_config.lua:15: LiMe Config tests test simple get
packages/lime-system/tests/test_lime_config.lua:20: LiMe Config tests test get with fallback
packages/lime-system/tests/test_lime_config.lua:24: LiMe Config tests test get with lime-default
packages/lime-system/tests/test_lime_config.lua:30: LiMe Config tests test get precedence of fallback and lime-default
packages/lime-system/tests/test_lime_config.lua:36: LiMe Config tests test get with false value
packages/lime-system/tests/test_lime_config.lua:41: LiMe Config tests test get_bool
packages/lime-system/tests/test_lime_config.lua:54: LiMe Config tests test set
packages/lime-system/tests/test_lime_config.lua:64: LiMe Config tests test set nonstrings
packages/lime-system/tests/test_lime_config.lua:81: LiMe Config tests test get_all
packages/safe-upgrade/tests/test_safe_upgrade.lua:5: safe-upgrade tests test get current partition
tests/test_fake_uci.lua:4: Fake uci tests test simple get and set
tests/test_fake_uci.lua:14: Fake uci tests test multiple cursors
tests/test_fake_uci.lua:31: Fake uci tests test nested get and set
tests/test_fake_uci.lua:49: Fake uci tests test state not preserved between tests
tests/test_fake_uci.lua:54: Fake uci tests test save
tests/test_fake_uci.lua:59: Fake uci tests test delete
tests/test_fake_uci.lua:73: Fake uci tests test foreach
tests/test_fake_uci.lua:87: Fake uci tests test get_all

Here is the code:

$ cat run_tests 
#!/bin/bash

TESTS_PATHS='packages/*/tests/test*.lua  tests/test*.lua'
LIB_PATHS='tests/fakes/?.lua;packages/lime-system/files/usr/lib/lua/?.lua;packages/safe-upgrade/files/usr/sbin/?;;'

./tools/dockertestshell "busted -v ${TESTS_PATHS} --lpath='${LIB_PATHS}'" ${1}

Integration of unittests with Travis CI

LibreMesh has already a github/Travis integration with two objetives:

  • test that the packages can be built (no Makefile errors, etc)
  • build and publish the packages of the master branch to an external server

The build pipeline of LibreMesh has been broken for a couple of months because the docker image that has been in use is not longer available. This is becaouse there is an ongoing effort by aparcar to create canonical docker images for OpenWrt.
So I did an atempt to fix the current LibreMesh build pipeline using the new infraestructure in this pull request. The build is still not passing but it seems it is something easy to fix as the build is passing but then the deploy is failing.

Travis unit testing

Beside fixing the current pipeline and to integrate the unittesting work I did a refactoring of the build steps to have a unittest stage and a build stage. To do this I installed the Github/Travis integration on my lime-packages fork. In the following image you can see that the two stages are green (tests are passing) πŸ™‚

TravisPipeline

And here is the log of the unittests stage. You can see that it takes less than a minute to run the stage, with 15 seconds building the docker image and 0.011133 seconds to run the tests :100:

Next Steps

Now that the framework is in place and in continous integration we should be doing the following:

  • Add documentation on how to write tests
  • Integrate nixio in the docker image
  • Proofread the core LibreMesh code and inform about its testability
  • Provide some mocks for common functionality (uci already done!)

The first weeks of august I will move to Catalunya to work with a core developer of LibreMesh. So with my mentor NicoP we will adapt the schedule to take advantage of this.

Load-correlated distributed bandwidth analysis for LibreMesh networks – #2: Setting up the LibreMesh test network

In order to use the latest version of everything, I merged the latest commits from the LibreMesh community into my forked lime-packages repository.

To set up the test network was more complex than expected.
I managed to collect a very disperse set of routers: 8 routers of 6 different producers and 7 different models.
Two of these are officially supported by LibreMesh (TP-Link TL-WDR3600, Ubiquiti NanoStation Loco M2) and the others which are supported by OpenWrt but not by LibreMesh (Comtrend AR-5387un, Huawei HG556a-C, Observa VH4032N, Comtrend AR-5315u, Astoria ARV7519RW22-A-LT).

The non-LibreMesh-supported routers either cannot do multi-AP or mesh via IEEE802.11s, but this was not expected to be a problem as I took care to add the support to AP-client networks (no need for the routers to support IEEE802.11s mesh, only the last mentioned router does not have support for wifi at all).
My solution was based on BMX6 which seems will be dropped in the next LibreMesh release in favour of Babeld, and this will require an adaptation of the AP-client solution.

As mentioned in the previous post, I started compiling my LibreMesh firmware based on LibreRouter fork of OpenWrt 18.06 repository.
When I flashed my routers and configured the wireless interfaces for using AP or client rather than the default AP+AP+IEEE802.11s, most of them were showing strongly erratic behaviours.

So I decided to flash the routers with plain OpenWrt 18.06.2 without using LibreRouter fork and to install all LibreMesh packages via opkg.
In order to ensure that the compiled packages will be compatible with OpenWrt 18.06.2 release, the LibreMesh packages were compiled in my local buildroot of OpenWrt branch openwrt-18.06.
Then the openwrt/bin/ directory was served via HTTP from my local machine.
In order to have the routers accept my local repositories I had to install usign, create a key pair, sign the Packages file, push the public key to the routers and add the directions of the local repositories to /etc/opkg/customfeeds.conf
So for example, the customfeeds.conf file of the Observa VH4032N router will look like:

src/gz local_base http://192.168.1.3/packages/mips_mips32/base
src/gz local_libremap http://192.168.1.3/packages/mips_mips32/libremap
src/gz local_libremesh http://192.168.1.3/packages/mips_mips32/libremesh
src/gz local_luci http://192.168.1.3/packages/mips_mips32/luci
src/gz local_packages http://192.168.1.3/packages/mips_mips32/packages
src/gz local_routing http://192.168.1.3/packages/mips_mips32/routing
src/gz local_brcm63xx_smp http://192.168.1.3/targets/brcm63xx/smp/packages

Once completely configured, the network structure planned is represented in black in the following scheme.

Planned test network structure.

In order to better test the on a proper mesh, I ordered 3 additional routers fully supported by LibreMesh: YouHua WR1200JS (see here and here) from here.
They come with OpenWrt pre-installed and they fully support multi-AP + IEEE802.11s.
Once I will receive these two additional routers I will be able to add the mesh part of the test network as indicated in the scheme in red.

Regarding the load analysis of the network, the first approach will be to obtain this value from the number of clients currently connected to the network.
This number will be obtained in at least the following ways:

batadv-vis -f jsondoc | sort -u | wc -l

ip neigh show nud reachable | wc -l

In the meanwhile, a minor enhancement has been suggested and two others were accepted.

GSoC 2019 – Unit testing LibreMesh – Update 1

In the last weeks I have been involved in getting deeper into becoming part of the development team of LibreMesh.

During that process, I worked together with NicoPace in writing this blogpost where we build a solid ground for unit testing: https://blog.freifunk.net/2019/06/03/gsoc-2019-evaluating-options-to-do-unit-and-integration-tests-in-libremesh-and-a-first-working-example/

Not covered by the last blog post is the work that I did in a fake/mock implementation of the libuci library in lua. This allows writing a lot of tests for LibreMesh as ucithe most used library in the codebase that make sense to write a mock. The implementation is very small but covers the most used functionality of libuci: cursor(), get(), set(), save(), delete() and foreach(). This was implemented doing TDD with the support of the unittesting framework.

All this work is being done in the following branch of my lime-packages fork: https://github.com/spiccinini/lime-packages/commits/unittest_docker

During the upcoming weeks all this work will be properly released as a PR to the lime-packages repo accompanied by the Travis CI integration in a Docker container to do the tests in a contained environment, and more tests are going to follow πŸ™‚

GSoC 2019 – Evaluating options to do unit and integration tests in LibreMesh (and a first working example)

Prior experience

Some people that have been writing about unit testing in lua, and also about lua for embedded:

  • http://lua-users.org/wiki/UnitTesting
  • https://blog.freifunk.net/2019/05/26/gsoc-2019-unit-testing-libremesh/

Requirements

The set of requirements for the LibreMesh project in regards of testing are the following:

  • Must support lua 5.1, as it is the one packaged in OpenWRT
  • Must provide helpful assert test functions, like showing table diffs or formatters for outputs to understand the difference easily
  • We need mock functionality, because a lot of functions are hardware related and may not be possible to test them on hardware all the time.
  • It is desirable for the library to be a one-file import, so we can use it in the routers and in the continuous integration in the same way we do it in the desktop.

Options

We need to consider unit testing, mocking and coverage tests.

Unit Testing

There is a list of unittesting libraries in the lua package manager, luarocks:
https://luarocks.org/labels/test?non_root=on

These are the ones analyzed:

LuaUnit

URL: https://github.com/bluebird75/luaunit

Upsides:

  • No external dependencies (single file),
  • it’s well maintained,
  • popular (200k downloads luarocks, 200 stars Github),
  • supports multiple versions of Lua.
  • Has TAP support (for CI)
  • It is being used in other OpenWRT based images like OpenWISP: https://github.com/openwisp/openwisp-config/blob/master/openwisp-config/tests/test_utils.lua

Downsides:

  • It doesn’t have mocking helpers: Could be combined with a one file mocking library mockagne

Telescope

They describe themselves as a A highly customizable test library for Lua that allows declarative tests with nested contexts.
URL: https://github.com/norman/telescope/

Last release 2013. Lua 5.1 last release was done in 2012, so it is not that big of a deal… but has not received any updates since, so it might have not evolved since.

Busted

URL:

  • https://github.com/Olivine-Labs/busted
  • http://olivinelabs.com/busted/

Upsides:

  • Very well maintained by olivinelabs and contributors,
  • very popular (900k downloads luarocks, 800 github stars).
  • It has setup/teardowns and also mocks, spies, and matchers.
  • Has TAP support.
  • Has good documentation
  • It is integrated with luacov for test coverage

Downsides:

  • Must be installed using luarocks (it is a lot of files). A question has been posted to luarocks to explore the possibility of creating bundles for a package (one file with all dependencies). That would simplify its use: https://github.com/luarocks/luarocks/issues/1023

Mocking libraries

lua-mock

URL: https://github.com/henry4k/lua-mock

Mach

URL: https://github.com/ryanplusplus/mach.lua/

More or less well maintain, though it is not so popular.

Coverage reports

luacov

URL: https://github.com/keplerproject/luacov

Upsides:

  • Well maintained

Unit testing Architecture for LibreMesh (only for LibreMesh?)

The idea is to allow unit-testing packages and also the integration between them as some of the packages depend on other packages.

Context:

  • LibreMesh enables functionality selecting which packages must be installed and changing enabling/disabling the exposed features in configuration files.
  • In some packages the code is all inside the executable lua file (not like a library)
  • Some packages are independent, provide functionality without depending on lime-system. This packages are in lime-packages for convenience.
  • Packages could (should) be migrated into OpenWrt repositories. This migration may happen steps and when migrated the code may be in an independent repository for the package iteself.
  • many packages are wrappers of bash code, and this complicates the tests as you need a running system to test it out

This context is not an easy one to test as it has a lot of trade offs!

Options

Single and global tests directory

The easiest architecture is to have a global tests directory and some utility functions that allow to “install” a certain module for testing

Directory structure:

lime-packages/package/package1/
lime-packages/package/package1/...
lime-packages/package/package2
lime-packages/package/package2/...
lime-packages/tests/utils.lua
lime-packages/tests/fake_modules/nixio.fs
lime-packages/tests/test_package_1.lua
lime-packages/tests/test_package_2.lua
lime-packages/tests/test_package_1_and_2_integration.lua
lime-packages/run_tests.sh

Example of a (integration) test that uses libraries and fake modules
test_lime_proto_anygw.lua:

utils = require("test.utils")
-- installs required modules in the lua path
utils.install_limesystem_module() -- to allow access to lime.network, etc
utils.install_module("packages/lime-proto-anygw/src/anygw.lua", "lime.proto.anygw")
utils.install_module("tests/fake_modules/nixio.fs", "nixio.fs")

-- now we can load the modules
anygw = require("lime.proto.anygw")

function test_foo()
  assert anygw.foo() == 'bar'
end

Pros:

  • Easy to start with and to understand

Cons:

  • As all tests are together it is not easy to move a package to other repository or even to its own repository.

Tests inside each module and a shared tests directory to test integrations

Directory structure:

lime-packages/package/package1/
lime-packages/package/package1/tests/test_foo.lua
lime-packages/package/package2
lime-packages/package/package2/tests/test_bar.lua
lime-packages/tests/utils.lua
lime-packages/tests/fake_modules/nixio.fs
lime-packages/tests/test_package_1_and_2_integration.lua
lime-packages/run_tests.sh

Pros:

  • Each package has more independence

Cons:

  • ?

Testing in a fully working image with all packages and libraries installed

Tests can be run installing all files of the packages (by some script that parses the Makefiles, or “by hand in a helper script”).

Pros:

  • It requires less boilerplate to test package integration
  • Libraries of the target system can be used directly
  • Other packages may be installed

Cons:

  • Less control over what it is really happening
  • slower than the other options, as it needs to load a full system

Fake modules as library

Testing executable modules

Executable lua modules can be tested with a simple modification in the file creating a main() function and then using something like:

function main()
  --- the main code in here
end

-- detect if this module is run as a library or as a script
if pcall(debug.getlocal, 4, 1) then
  -- Library mode, do nothing
else
  -- Main script mode
  main()
end

Then from a test file it can be loaded like any normal module and all the functions can be accesed without executing main()

Testing environment

A docker environment (or multiple, even using “qemu-user” under docker) with the testing libraries and target lua version is loaded by the “run_tests.sh” executable.
This environment can provide some useful libraries for testing (coverage reporting,

Direct import and testing can be done for unit tests were functions are not using system libraries, or when these are simple enough to be mockable (mocking shell() calls).

luarocks router-local environment

described by this guy, thanks!

We did a trial to run busted inside a router… that would have been useful for in-router tests and also for tests inside a virtual environment.

It used the strategy of installing luarocks dependencies in a separate directory and copying them to the router.

The steps are pretty straightforward:

$ sudo apt install luarocks
$ luarocks install --tree lua_modules busted
$ cat <EOF
require 'busted.runner'()

describe('Busted unit testing framework', function()
  describe('should be awesome', function()
    it('should be easy to use', function()
      assert.truthy('Yup.')
    end)

    it('should have lots of features', function()
      -- deep check comparisons!
      assert.same({ table = 'great'}, { table = 'great' })

      -- or check by reference!
      assert.is_not.equals({ table = 'great'}, { table = 'great'})

      assert.falsy(nil)
      assert.error(function() error('Wat') end)
    end)

    it('should provide some shortcuts to common functions', function()
      assert.unique({{ thing = 1 }, { thing = 2 }, { thing = 3 }})
    end)

    it('should have mocks and spies for functional tests', function()
      local thing = require('thing_module')
      spy.spy_on(thing, 'greet')
      thing.greet('Hi!')

      assert.spy(thing.greet).was.called()
      assert.spy(thing.greet).was.called_with('Hi!')
    end)
  end)
end)

EOF
> test.lua
$ cat <EOF
-- set_paths.lua
local version = _VERSION:match("%d+%.%d+")
package.path = 'lua_modules/share/lua/' .. version .. '/?.lua;lua_modules/share/lua/' .. version .. '/?/init.lua;' .. package.path
package.cpath = 'lua_modules/lib/lua/' .. version .. '/?.so;' .. package.cpath
EOF
> set_paths.lua
$ scp -r test.lua set_path.lua lua_modules root@thisnode.info:~
$ ssh root@thisnode.info 'lua -l set_paths test.lua'

The output of this command though was not what we expected:

$ ssh root@thisnode.info 'lua -l set_paths test.lua'
lua: lua_modules/share/lua/5.1/pl/path.lua:28: pl.path requires LuaFileSystem
stack traceback:
    [C]: in function 'error'
    lua_modules/share/lua/5.1/pl/path.lua:28: in main chunk
    [C]: in function 'require'
    lua_modules/share/lua/5.1/busted/runner.lua:3: in main chunk
    [C]: in function 'require'
    test.lua:1: in main chunk
    [C]: ?

Deeper inspection showed that the library’s dependencies had C bindings that we compiled for a different arquitecture, so tha strategy was not feasable for routers anymore:

find . -name \*.so
./lua_modules/lib/lua/5.1/lfs.so
./lua_modules/lib/lua/5.1/term/core.so
./lua_modules/lib/lua/5.1/system/core.so

where term/core.so and system/core.so are system libs, but lfs.so is from LuaFileSystem.

There is a lua-only implementation of LuaFileSystem: https://github.com/sonoro1234/luafilesystem , but as it doesn’t support luarocks deeper understanding of the platform is needed to attempt to replace the only binary binding with this implementation.

Found a sister library from that one in luarocks: https://luarocks.org/modules/3scale/luafilesystem-ffi based on this repo: https://github.com/spacewander/luafilesystem. So:

$ luarocks install --tree lua_modules luafilesystem-ffi

installed it and then touched the code were the penlight library was imported in busted:

$ grep -r require.\*lfs *
path.lua:local res,lfs = _G.pcall(_G.require,'lfs')
$ pwd
/home/nico/tmp/lua_local_test/lua_modules/share/lua/5.1/pl

but this library, as it depends on ffi (a module of luajit), it depends on a C extension too.

also, luafilesystem exists as a native library in OpenWRT, so it could be included just for the sake of the exercise: https://openwrt.org/packages/pkgdata/luafilesystem … but not this time.

Docker with LuaRocks and Lua 5.1

Some docker images already exist:

  • https://github.com/akornatskyy/docker-library/
  • https://hub.docker.com/r/abaez/luarocks/
  • https://hub.docker.com/r/abaez/lua
  • https://github.com/martijnrondeel/docker-luarocks

A simple Dockerfile whould be:

FROM abaez/luarocks:lua5.1

WORKDIR /root

RUN luarocks install luacov; \
    luarocks install busted

Excellent blog post on handling Lua paths: http://www.thijsschreijer.nl/blog/?p=1025

Example of LUA_PATH to load executables (without ending in .lua): LUA_PATH="packages/safe-upgrade/files/usr/sbin/?;;". The double ;; at the end means append the default paths.

First attempt running tests

I selected safe-uprgade libremesh module to start doing unittests because I know the module as I wrote it so I already know which code would gain value being tested. Also I am confident to refactor the module if needed.

First I start using the busted unittest library with a simple test of the function get_current_partition() that must return the partition number that is currently running. As this is done from reading /proc/mtd I refactored the function so we can pass from the outside the expected content.

Content of lime-packages/safe-upgrade/tests/test_safe_upgrade.lua:

local su = require "safe-upgrade"

describe("safe-upgrade tests", function()

    it("test get current partition", function()

        proc_mtd = [[#!
        dev:    size   erasesize  name
        mtd0: 00020000 00010000 "factory-uboot"
        mtd1: 00020000 00010000 "u-boot"
        mtd2: 00180000 00010000 "kernel"
        mtd3: 00d40000 00010000 "rootfs"
        mtd4: 00b10000 00010000 "rootfs_data"
        mtd5: 000f0000 00010000 "config"
        mtd6: 00010000 00010000 "firmware"
        mtd7: 00ec0000 00010000 "fw2"
        mtd8: 00ec0000 00010000 "ART"
        ]]
        assert.is.equal(su.get_current_partition(proc_mtd), 1)

        proc_mtd = [[#!
        dev:    size   erasesize  name
        mtd0: 00020000 00010000 "factory-uboot"
        mtd1: 00020000 00010000 "u-boot"
        mtd2: 00180000 00010000 "kernel"
        mtd3: 00d40000 00010000 "rootfs"
        mtd4: 00b10000 00010000 "rootfs_data"
        mtd5: 000f0000 00010000 "config"
        mtd6: 00010000 00010000 "fw1"
        mtd7: 00ec0000 00010000 "firmware"
        mtd8: 00ec0000 00010000 "ART"
        ]]
        assert.is.equal(su.get_current_partition(proc_mtd), 2)

    end)
end)

The modifications I did to do to the safe-upgrade module are:

  • refactor get_current_partition() into get_proc_mtd() and get_current_partition(proc_mtd). This way we can inject different /proc/mtd values for testing.
  • return a table containing the module exported functions when running in library mode. In this case we are exporting get_current_partition.
  • move argparse module loading to the parse_args function that only gets executed when the module is run in script mode (not library mode)

This changes may not be the best way of handling testing but for now it allow us to move forward without digging a hole too deep:

[san@jones lime-packages]$ git diff
diff --git a/packages/safe-upgrade/files/usr/sbin/safe-upgrade b/packages/safe-upgrade/files/usr/sbin/safe-upgrade
index 8aeece4..fc6d467 100755
--- a/packages/safe-upgrade/files/usr/sbin/safe-upgrade
+++ b/packages/safe-upgrade/files/usr/sbin/safe-upgrade
@@ -17,7 +17,6 @@
 ]]--

 local io = require "io"
-local argparse = require 'argparse'

 local version = '1.0'
 local firmware_size_bytes = 7936*1024
@@ -114,10 +113,15 @@ function get_current_cmdline()
     return data
 end

-function get_current_partition()
+function get_proc_mtd()
     local handle = io.open('/proc/mtd', 'r')
     local data = handle:read("*all")
     handle:close()
+    return data
+end
+
+function get_current_partition(proc_mtd)
+    local data = proc_mtd or get_proc_mtd()
     if data:find("fw2") == nil then
         return 2
     else
@@ -289,6 +293,7 @@ end


 function parse_args()
+    local argparse = require 'argparse'
     local parser = argparse('safe-upgrade', 'Safe upgrade mechanism for dual-boot systems')
     parser:command_target('command')
     local show = parser:command('show', 'Show the status of the system partitions.')
@@ -338,6 +343,9 @@ end
 -- detect if this module is run as a library or as a script
 if pcall(debug.getlocal, 4, 1) then
     -- Library mode
+    local safe_upgrade = {}
+    safe_upgrade.get_current_partition = get_current_partition
+    return safe_upgrade
 else
     -- Main script mode

To run the test inside the docker container we have to add the module under test to the LUA_PATH (check that it is an executable module that does not ends with .lua so the expresion is ? instead of ?.lua):

(docker) [san@jones lime-packages]$ LUA_PATH="packages/safe-upgrade/files/usr/sbin/?;;" busted packages/safe-upgrade/tests/test_safe_upgrade.lua
●
1 success / 0 failures / 0 errors / 0 pending : 0.001127 seconds

Sum up

  • We did an evaluation of testing libraries and shrinked our selection to busted or luaunit. We are selecting busted as it has more pros than luaunit, mainly integrated mocking and coverage.
  • Some architectural options were proposed as starting point. We discussed them with @nicopace and we will be moving forward iterating with the Tests inside each module and a shared tests directory to test integrations idea.
  • Running tests in a local OpenWrt based device was investigated (thanks @nicopace!)
  • A working Dockerfile is proposed.
  • I did a real world example of unit testing a single function of a simple module. Little but working πŸ™‚

GSoC 2019 – Load-correlated distributed bandwidth analysis for LibreMesh networks

Introduction

Performance tests are key for identifying the bottlenecks and optimize the network topology.
The main indicator is the bandwidth, but also other values can be useful like latency, number of active users for each node, load average and RAM consumption of each node.
The value of these quantities can vary greatly between the peak time and the night time, for this reason some of the measurements should be carried on in both these two moments.
Some other measurements, which can affect the user experience, could instead be run just in the night time.
To identify the night time we can’t relay on the router’s internal clock which could be years away from the actual time.
So a method for getting the network-wide peak time will be sought.
Each router in the network should separately run these tests, and for avoiding to influence each others’ results they should run at different times.
This synchronization should be possible taking advantage of the LibreMesh architecture and the shared-state service.

About me

Here’s Ilario, I studied organic chemistry in Pisa, Italy and I’m currently in a PhD on perovskite solar cells in Tarragona, Catalonia, Spain.
During the university I contributed to the mesh network eigenNet, part of the Italian community network consortium Ninux.
I started the (nowadays stalled) NinuxVerona community and once in Spain I started actively contributing to GuifiCamp and LibreMesh.

Setup of develop environment and initial interactions with LibreMesh community

After proposing a fix, I managed to build the LibreMesh firmware at its current stable release (17.06) using lime-sdk.
Then I built the latest LibreMesh code on top of forked OpenWrt 18.06 buildroot as suggested by the mentors; at first this was not possible on Arch Linux but after contacting with the community they updated the forked OpenWrt repository and it worked, thanks!
Finally, in order to be able to have the most updated OpenWrt code available, I compiled the latest LibreMesh code on top of the trunk (master branch, the unstable version) of OpenWrt buildroot, this was possible after adapting some configuration to the latest OpenWrt.
For pushing my code I forked lime-packages repository and created a gsoc2019 branch which can be accessed here.
Additionally, in case modifications to OpenWrt 18.06 core were needed, I will push them here.
All the buildroot-based compilation methods are already setup with the new branch as a feed, while the possibility of a back port to the stable LibreMesh 17.06 release will be evaluated once the project is completed.

Objectives

  • Flash with LibreMesh 4+ routers (preferably different models with different performances, if needed buy some) and setup a test network;
  • define a set of information to collect, divide it in network safe (e.g. number of clients)/network intensive (e.g. bandwidth test) and understand how to collect this data;
  • understand how a Prometheus exporter works and develop one in lua for the “network safe” quantities;
  • choose a reasonable “network safe” quantity for identifying the usage peak of the whole network (e.g. number of clients);
  • develop a script that locally identifies the peak and the night time;
  • develop the scripts for the network intensive tests, these should also store on the flash memory the results;
  • discuss with the mentors if the previous logs can be overwritten or if they should accumulate on the router for a certain period of time, in the latter case implement it;
  • implement a strategy for avoiding network intensive tests on different routers to happen at the same time;
  • if for achieving this last point a synchronization of the routers’ clocks is unavoidable, find a converging way for doing so or an available tool which does not require internet access (no NTP);
  • write a small Prometheus exporter for serving the last peak and off-peak network intensive tests results;
  • write the init service;
  • create a Makefile for the package;
  • test in a real-world community;
  • adapt the code written for LibreMesh trunk version to run also on LibreMesh 17.06 release;
  • adapt the code to plain OpenWrt, evaluate needed dependencies, if possible push the created package to upstream repository.

GSoC 2019 – Unit testing LibreMesh

Project introduction

LibreMesh as an embedded Operating System depends a lot on the underlying hardware. But, there are some parts of the code that don’t have that dependency, neither they depend on the network, or any particular state that the device could have. Also, there are many other cases were the states that one would like to achieve in order to reproduce a situation are complex or impractical to reproduce with hardware. In this project I will integrate a testing and mocking framework to LibreMesh and provide the functionality needed to easily write new tests for actual or new code. Also I will add tests for the core functions of LibreMesh.

Motivation

Unit testing the LibreMesh code-base will greatly help on approaching this two situations, and help having a much more robust solution for the communities it serves. Having automated unit testing integration test may improve the quality, the development speed, and shorten the release cycles of the LibreMesh software.

Also, having tests that safeguard the core functionality may allow new developers to engage with the code-base with more confidence.

For some developers (like me) having the option of doing test driven development greatly enhance the development experience. For reviewers it is also easier to understand and maintain code that has unit tests.

About me

I am Santiago Piccinini an Electronic Engineering student of University of Buenos Aires, Argentina. In my studies I focused on wireless, communication protocols, signal analysis, electronics prototyping and software development. I am currently finishing my master thesis.

I have been involved in different projects related to Community Networks, lately focused in the LibreRouter project.

This is my first GSoC, I wanted to participate for a long time!

Deliverables

  • Integrate unit testing and mocking framework to LibreMesh, that would allow the code-base to be tested with the specific lua version of the target system
  • Integrate tests with existing testing infrastructure (Travis CI)
  • Incorporate coverage reports and increase the coverage level of LibreMesh code
  • Proofread report of LibreMesh code testability, particularly what needs to be mocked of the code in order to be tested, and a rough idea of the complexity of the refactor needed.
  • At least one pull request of a refactor task for different levels of complexity as examples for the community to follow.
  • Try to write mocks for common functionality, for example:
    • Iwinfo
    • Nixio.fs, etc
    • Uci, lime.config
  • Write tests for some core functionality of lime-system package.
  • Refactor some LibreMesh code if needed for easy testing.
  • Add a device emulation module that provides specific mocking of device details (iwinfo, /sys/class/iee80211/, etc) to allow writing integration tests.

Next steps

I will start doing research about open-source Lua unit testing and mocking frameworks starting from this blog post https://blog.freifunk.net/2017/06/29/tdd-unit-testing-lua-openwrtlede-case/ from @nicopace. Next I will discuss with mentors about pros/cons of each framework and will propose a reasonable architectural solution to integrate this framework with some testing code as example and discuss with LibreMesh developers. If everything goes smooth then I will integrate it with Travis CI.

GSoC 2018 – Ground Routing in LimeApp – 1st update

Overview

In this past month I was working on the update of the lime-app dependencies (it was quite outdated). I also worked on the view and the ubus module that reads and saves ground routing settings in the LiMe config file.

The view: (Github LimeApp branch)

It is the minimum configuration of a plugin for lime-app. It has defined the constants, the store, actions (set and get) and basic epics to obtain the data using uhttp-mod-ubus.

Lime-app uses Preact for rendering the views, redux for state management and rxjs-observable as middleware for asynchronous events. For now you only get the setting as a json and expose it to the user.

Ubus (Github lime-package-ui branch)

Create the lime-groundrouting package that exposes and sets the graound rotuing configuration to lime. For the time being, just expose the settings.

ubus call lime-groundrouting get

To do this I use the LUA library lime.config.

Next step: Save changes.

In the coming weeks I will mount the form and the validation scheme in both the app and the ubus module.