Linux splice() and company

So, I’ve been working as ever, hard on getting some stuff to scale.

For example Lusca/Squid.

I’ve already done the non-threaded async event driven loop, for neoGUARDIAN, our transparent SMTP cunning gadget, and it’s been doing great snooping on SMTP traffic to find out if you’re a spammer or not.

I used Danga::Socket and some clever linux ioct’ls to get full transparency incorporated, meaning that you wouldn’t even know if I’m snooping on your TCP port or not. Interestingly enough, the interpreted perl is not whatsoever the bottleneck. I’ve been quite impressed with the scalability of the thing, especially when offloading the hard work to a Gearman style worker asynchronously.

Now, however something else has grabbed my attention.

The linux splice() and tee() syscalls.

Imagine a file descriptor, hooked up to a client socket, and another to a server socket. Typical relaying proxy style.

Now, instead of having to read() from one copying stuff to a buffer and then write()ing it to another buffer, you can simply tell the linux kernel to connect the two socket handles together, and the kernel will take care of connecting the source and destination without any user space copying.

Now add a tee() similar to the Unix “tee” command, and you can write the data being sent and received by ┬áthe two sockets to disk. In fact, they don’t even need to be sockets.

Can anyone say caching proxy ? Lusca has really made some strides in breaking out this kind of code so that it can be modularized, but it’s still going to take some serious hacking because the old squid code base really does depend so much on memcpy()ing things around.

The splice() functionality also supports splicing()ing from virtual memory and other kind of neat things.

It’s in my mind the ultimate way to get to the ZeroCopy style of I/O.

Certainly, portability is an issue, but then again, most modern systems start emulating Linux syscalls simply due to their elegance. It wasn’t always so, in fact Linux emulates a lot of BSD/SySV style stuff, but that comes with the Unix domain…

The question is… Is portability in code worth the performance penalty?

I’m almost keen to start a Lusca Linux-only port. Quite frankly I haven’t given a toss about portability ever, and I’m sure it would probably end up being a better product. Take nginx for example. Yes it will run on win32, but it sucks.

If your Operating System ┬ákernel doesn’t support some standard features required to get scalability then I’m sorry, but you’re fucked. Why should developers have to #ifdef crap just to make sure it compiles on some ancient thing?

The Java fanbois will probably jump at this opportunity to indicate the portability and awesomeness of their environment, but all I can say is that not living close to the kernel will always mean that you’re screwed due to the happy “compile once run anywhere” mantra. Good luck in “import System.Socket.Splice” or whatever it might be called in Java.

For me, it’s simple, either port it to your kernel or whine at your vendor (futile, isn’t it?)

In my mind it’s time to pick a platform and let the kernels match the portability. The effective implementations will survive. The crap ones will die.

Natural select()ion for the win.

Author: roelf on November 9, 2010
Category: Unix Development

Leave a Reply


Last articles