market data

 

 Reinventing the “Cloud”

October 7, 2009

nihSeven years ago, when I still worked as the Internet Architect for I-Net Bridge, a company distributing Market Data (real-time stock information and news) in the South African market, I went to my boss, Paul Septhon and said that we had to extend the real-time messaging layer (IML) to include ASCII style messages so that it could be easily integrated into I-Net’s web delivery platforms.

IML (I-Net Bridge Messaging Layer) as it was called at that point, was a publish/subscribe real-time messaging layer for distributing I-Net’s real-time data to it’s customers, from various data sources such as the JSE, Bridge, Dow-Jones and the London Stock Exchange.

The problem was that the publisher and subscriber API’s were extremely event-driven, using callbacks and largely implemented using C, or C++. When it came to developing our web applications it became a problem to integrate a call-back driven, and binary-transport focused system into web applications that are typically “request-get-forget” style systems.

Thus, was invented “CABS” aka “Common Application and Backoffice System”. CABS predated service-oriented architecture and distributed systems that we are seeing now, by about 6 years. Using the existing reliable binary-focused publish/subscribe system that was IML, I-Net developed a scalable ASCII-protocol based client/server architecture that makes things like gearman look like amateur attempts.

The system support load-balanced function calls, a complete directory-like tree structure, mount points for various publishers and a plethora of client and publisher interfaces, including TCL, php, Perl and C/C++.

Data could be accessed transparently in the entire “data” tree, with full ACL based permissions required by the underlying IML layer, thus limiting the access of data by clients only to publishers that they subscribe to. Publishers could then implement finer grained access control. We proceeded to implement one of the most feature rich, web-based MDDS syndication and publishing systems in South Africa based upon this architecture.

It was a phenomenal achievement and I reckon, one of the grandest in South African development history, considering the time, the recent .com bubble bursting and everything that ensued post-that. We even implemented user-authentication and statistics gathering using this architecture. We had about 8 Apache based-linux front-end servers, communicating with the “cloud” of distributed data publishers across multiple geographic locations.

The front-end apache’s were mod_perl and HTML::Mason scripts that talked to the publisher’s with a simple ASCII style protocol. The HTML::Mason components used aggressive memcached caching in order to scale our performance.

Nowadays, I hear about “Web 2.0” startups, and dig into the architecture and system used, and have not found anything approaching the implementation we had at I-Net Bridge.

Until, today I came across gearman. Having been a memcache and danga.com fan for many years, I was surprised to see — finally, something that resembles the original I-Net Bridge CABS.

Gearman, is very simple, based on a simple job submission client, “mnemonic function” based job-router (gearmand) and hooks up to a bunch of “workers” that actually do the work.

In terms of architecture it focuses on the basics, redundancy, scalability and leaves all the rest of the complicated stuff such as the actual handling of access-control and marshalling of data as a “undefined contract” between the publisher and subscriber. Gearman simply handles the distribution, and reliable queuing of tasks and responses. It doesn’t even have client authentication! Those, I can work around fairly easily…

It is nowhere near as complicated as CABS was (nor do I think it will ever be) but having waved a sad good-bye to an amazing system at I-Net Bridge, I’m glad to finally find something that allows me to build some systems on a common distributable platform. I’ve been fiddling with PHP beans, UDP-based broadcasting of requests queues and various other solutions for Neology‘s carrier-grade caching, RADIUS and billing systems, and I’m glad to have finally found some replacement “glue” to get everything together again in a consistent fashion.

I intend to use gearman for everything, including pinging my desktop 🙂