What we have so far
Last month I introduced my test setup intended for fast kernel trials and network development. After that updated my shadowsocks-libev fork for the latest 3.2.0 version which is the latest upstream stable version. This fork dont do any encryption which is not so secure but faster – and in our new approach: we can put the data plane into the kernel (because we cant do any data modification in the userspace).
Possible solutions
The problem emerged in a different environment recently: at the cloud/datacenter scope. In the cloud transmission between containers (like Docker) happens exactly like in our SOCKS proxy case: from user to kernel, than back to user (throught the proxy) than back to kernel, and to user. Lots of unnecessary copy. There was an attempt to solve that: kproxy .This solution is working pretty well, butthere are two drawbacks: not merged into the kernel (the main part is a module, but also modifies kernel headers) and in my testsit is slower than the regular proxy with the extra copies. Sadly I dont know the exact problem, but with my loopback tests on a patched 4.14 kernel were about ~30% slower than a regular proxy.
The kproxy is currently AFAIK not in development anymore, because featuring TCP zero-copy there is a better solution with zproxy, but its not released yet. But some part of the original kproxy code is already merged into the kernel part of the eBPF socket redirect function: https://lwn.net/Articles/730011/
This is nice because its standard, already in the vanilla 4.14 kernel, but a bit more complicated to instrument it, so I will test it later.
The backup solution if none of them works the I will try it with netfilter hook with the skb_send_sock function, but that version is very fragile and hacky.