GSoC 2019 – Monitoring of a community network, first results

1 Intro

Like any network, in community networks it is important to know the status of each of the teams that compose it, to track these over time and identify possible problems.
To monitor the routers of the network we need to store the data of all the equipment, centralize, analyze and visualize it. The metrics that we reveal can be divided into two large groups:

  • Numbers such as uptime, sent packets, signal strength, etc.
  • Those that are text like the logs.

In this first stage we are going to concentrate on those of the first type.

2 Collecting the metrics

Prometheus is one of the most used free software for event and alert monitoring. The clients (the routers in this case) expose a http server which Prometheus scrapes with a periodic frequency (HTTP pull model) and saves the data in a time database. Prometheus defines 4 types of metrics that are used to generate each instrument (what we are going to measure).
Grafana allows us to connect with Prometheus and generate different personalized dashboards and extend the existing graphics to our needs. It also allows us to generate a system of alerts.

2.1 The setup for the first metrics

In our experimental setup we will create a mesh network with a router that has LibreMesh. In addition we will have a raspberryPi in the network where we will have installed Promemtheus and Grafana.

2.2 Prometheus client

As mentioned previously, each router will be running a Prometheus client which will serve via http a plain text with the client’s metrics.

2.2.1 Python vs Lua

Prometheus offers a library in python to do the implementation of the client. The problem is that when we are working on routers, the space available to install applications is very limited. The interpreter of python weighs 50MB but the interpreter of Lua only 4kB

2.2.2 A new client

OpenWRT has a client implementation of Prometheus on Lua, but we found two “problems” in this implementation :

  • Use lua-socket to create the server which need install an external package. We are going to use uhttpd which is a server already installed on the router and allow us to execute a Lua script in a custom url.
  • Each instrument is written from scratch. We want to implement 4 basic objects (the possible metrics) so that it is simple to extend to new instruments.

2.2.3 Initial Metrics

Initially we will measure:

  • Uptime
  • Load avg
  • Mem info
  • Package per interface
  • iwinfo
  • Chanel occupation

3 The first results

Some metrics collected by Prometheus visualized in a Grafana dash

Next steps

In the next week we will be working on:

  • Add missing instruments
  • Pack the client correctly
  • Testing in a real mesh network
  • Document these steps

GSoC 2019 – Log Monitoring on LibreMesh

Log analysis is the way to find and recover the problems (known or not) of hardware, software or “rare” traffic on the network. In addition to the technical problem involved in unifying the logs of various routers and analyzing them, in community networks we frequently encounter the following additional problems:

  • Temporary disconnections. Due to geographical and / or atmospheric conditions, some equipment is temporarily disconnected from the network, so we have to design a system that allows us to “transfer” those logs.
  • Several of the routers used in community networks have limited resources to store the logs.
  • Several community networks do not have a sys admin within the network or they may not have Internet access (total or only temporary) to receive external help.

The idea of this GSoC is to develop a system that allows us to unify the logs of the routers of the network, filter them to stay only with the relevant ones to analyze and generate automatic analyzes (for example analysis of correlation of logs) to find possible problems and report them to the community.
Also, we want to develop a general dashboard of the state of the community network.

About me

I’m Franco Bellomo, a student of Exact Sciences at the University of Buenos Aires. My area of study is mathematical analysis and computational modeling. My previous free soft projects are related to academic problems so I am very happy with this new challenge.

I am an activist for free knowledge and I am very motivated to contribute to community networks.

Goals

  • Normalize and decrease the size of the logs. For this I want to perform a test between developing and training an own model (starting from a Huffman tree) in comparison with using (liblognorm) [https://www.liblognorm.com/].
  • Centralize Within the network there will be an RPi which will help us to join the standardized logs. In this step you will not consider the teams that are temporarily out.
  • General Dashboard. Visualization of the topology of the network and the status of each team.
  • Analysis of traffic and outlier in the network.
  • The records within the log are labeled (debug, info, warning, etc). We want to generate a system of auto tageo of groups of registries, that is to say that combinations of logs are potentially dangerous. For this we are going to use algotirmos of classification.
  • Extraction of features of the logs to make an unsupervised model of anomalies detection.
  • Obtain the log of the connected routers. For this we are going to use the community telephones as bridges.
  • Generate a good documentation!