I participated in the ICFP programming contest for 2006 over this weekend. I was taking part in it purely for the fun of it (yes, really). It turned out to be a very interesting task. Check out the task description to see what I mean.
You have to first implement an interpreter for a simple but quirky virtual machine called the Universal Machine (UM). You then use this interpreter on the given data file ("codex.umz") to run a small programme that takes a password and uses this password to decrypt and uncompress a binary image. You then use the interpreter on this extracted binary image to discover a small UNIX-like multi-user system where you can log in as "guest". This system, called "UMIX", includes a set of common UNIX utilities like "ls", "rm", "cat", "more", etc., an interpreter for BASIC (called "Qvikbasic", that only recognises numbers written with Roman numerals) and a file-transfer utility called "umodem".
Using the guest account, you complete a simple password-cracking utiliy written in BASIC and use it to discover the passwords of a couple of other user accounts. When you log in as those users, you find more tasks that you can complete to discover the passwords of other user accounts, which yield even more tasks and so on till you get the "root" password. For example, one of the tasks was in the form of an Adventure-like game and another was using a "2-dimensional" programming language, complete with ASCII-art boxes for each function, to solve a given problem.
It was incredible and this is easily one of the best ICFPC tasks I have ever attempted.
The organisers also provided a handy benchmark for testing your UM implementation (called "SANDmark") and a reference UM implementation (in the form of a UM binary of course). If you want to quickly check it out, here is my UM implementation (written in C). You can use "(\b.bb)(\v.vv)06AAVKIru7p0OmCvaT" as the password to extract the UMIX binary image from "codex.umz" using this interpreter. (Paul Ingemi has even implemented a UM with JIT compilation using LuaJIT!)
When I first implemented the UM, it ran quite slowly - it took a very long time for the UMIX login prompt to appear and over three hours just to compile "hack.bas" with Qvikbasic. I first tried simple tricks like converting indexed array accesses to incremental pointer accesses, caching values that were being used often, using a function-pointer-based dispatcher table to decode and execute instructions instead of a big switch statement, etc. but nothing helped. I found that the UM binaries allocate and deallocate a lot of "arrays" of various sizes and inspired by one of the posts to the ICFPC 2006 mailing list, I even implemented a system where arrays were only allocated in sizes of the smallest power of two larger than the requested size and recycled internally instead of being returned back to the operating system. However, even that did not help much.
That is when I decided to stop fooling around and ran the interpreter through qprof (I feel gprof and oprofile are not as immediately useful as qprof is - try it out and you would probably agree). I discovered that the real bottleneck was in the code that searched for a free slot to assign an index to a newly allocated array. When I eliminated that bottleneck by explicitly keeping track of freed slots, the performance of the interpreter improved drastically and it became usefully responsive. I eliminated the power-of-two array size foolery and it was still pretty responsive - malloc()/free() are really implemented pretty well in GNU libc and it is not usually worth it to second-guess it (except perhaps in extreme cases). (Also check out Doug Lea's notes on the design of a memory allocator.)
One of the wicked ideas from the mailing list did help however - using the pointer to an array's memory as its index instead of maintaining an "array of arrays", yielded a 20% boost in the performance of my UM implementation (as measured by SANDmark) but reduced its "portability" to only those machines where both integers and pointers are 32 bits.
The lesson learnt? We programmers are really quite bad at guessing the hotspots in our programmes. We should always use a profiler to find out where the real bottlenecks are in our programmes.
By the way, a few hapless souls discovered that the lack of unsigned integers in Java makes it unnecessarily difficult to implement something like this in Java. Why exactly couldn't the creators of the language provide unsigned variants of the integral types?