random walk through fields of musings

Sunday, March 21, 2010

implementing high-volume queues cheaply

Implementing write-heavy queues is hard to do in a technically "inexpensive" way -- RDBMS' are typically read-optimized and too heavy for the task of a simple queue, dedicated queuing packages require frameworks to run in etc., so the most simple queue that is "web-writable" I could come up with, which is chronologically ordered is to simply use the webserver logs as a queue. Serving static files, as simple as a text file that contains a simple string such as "ok" and named "ok.txt", is efficient in most webservers, and a small file is easily cached, so physical disk IO would be limited to writing the webserver access logs. To add items to the queue you can just use query parameters that will get recorded in the weblogs and then can be parsed out, ie.:

http://my_example_server.com/some_path/ok.txt?key=addme&value=withThisVal&whereAmI=London&why=toTrack

and the query params won't be "interpreted" by the static file, just show up in the weblogs as long as GET query params are set to be logged.


Most webservers allow writing the timestamp in a format that is easily machine readable (milliseconds since the epoch in UTC is probably a good choice). Using spread to write the logfile in realtime to the network would reduce the disk IO on the local webserver but a spread listener would have to write it somewhere, though the ability to add multiple listeners could spread the load.