Warning: there's no category theory in this letter - it's purely a reponse to the discussion of execution speed of FISh quicksort. Barry Jay wrote:
Why is quicksort in FISh faster? =================================
I would like to propose another reason the FISh quicksort appears to be faster than the library qsort - they aren't the same algorithm. In particular, the standard library qsort and the FISh sort are using different methods to pick the pivot. Choice of pivot in quicksort is important to ensure the worst case doesn't degenerate to O(n^2). The choice of pivot in the FISh code is always the first element, which does give very poor performance in the case the array is already sorted. I'm not sure what the qsort in the libc i have here does. However, typical tricks include selecting the median of three elements at the bottom, middle and top of the array, choosing a random element, choosing the median of three random elements, and so on. Importantly, each of these choices requires some work, and thus negatively impacts on the average speed of the sort in order to alleviate the worst case speed. Caveat: I'm not a professional benchmarker/tester, and the tests below are probably very flawed. I have tried to give details on everything that seems relevant, but your mileage may vary. This data is more given to suggest an alternative, plausible explanation for the observed speed difference in the FISh and C benchmarks, than to claim i have identified all the issues in the timing results found for FISh and C quicksort. To test this hypothesis, i coded up quicksort in C++ using the same pivot picking strategy as the FISh code. I then modified my C++ code, and the C benchmark code and C translated FISh code (CTFC) from the FISh website (by CTFC i mean the contents of test01.c), by altering the random number generator to allow specification of two command line arguments giving the multiplier and step for the seeding algorithm. i.e rather than seed = seed * 25173 + 17431 i used seed = seed * mult + 17431 for varying mult. This allowed me to test the extent to which the algorithm was sensitive to the data being sorted. What follows are timing values on a DGUX AlphaStation 255 (233 MHz) with mult = 1,2,3, ... ,10 (on an array of 10000 values). For each program, i give the total (user + system) time reported by the builtin time function of bash version 2.02.0(1)-release (alpha-dec-osf4.0) (averaged over 10 trials in each case). The time columns are libqsort (= the standard library qsort), cppqsort (= my C++ qsort), fishqsort (= CTFC). They were compiled with gcc, g++, gcc in each case, and -O optimization. NOTE: With these mult values, the data supplied to the qsort is deliberately nonrandom to try and detect the effect proposed in the second paragraph. mult libqsort cppqsort fishqsort 1 .0350 3.462 3.410 2 .0348 4.039 4.039 3 .1003 .0334 .0391 4 .0350 4.085 4.052 5 .1130 .0332 .0393 6 .0356 4.042 4.039 7 .1025 .0341 .0382 8 .0357 4.061 4.054 9 .1088 .0336 .0382 10 .0355 4.053 4.041 Observe that the libqsort has much better average performance (and that its profile is different to the other two - one could probably determine something about the choice of pivots with enough tests ;-). Also, the CTFC at best beats the library by a factor a bit under 3, whereas the library beats the CTFC by a factor of over 110. More importantly, the C++ and fish code run in very similar time, neither consistently beating the other. Also, their profiles are almost identical. I also ran one bigger test - in the case of a 90000 element array consisting of 1 to 90,000 in order, the library took around .3 seconds, whereas the CTFC overflowed the stack and crashed just under 200 seconds into the test (the C++ code took 315 seconds to overflow the stack and crash - i'm not sure what's causing the difference here, but i suspect it has to do with the fact my C++ code used paramaters rather than a static struct). Based on this, I'm proposing that the standard library quicksort is trading off speed for the average cases to bring the worst cases down to something acceptable. This is presumably a serious issue for a library quicksort - its better to run on average somewhat slower in order to get consistency, as this makes applications easier to test for market acceptance of speed, rather than the application apparently hanging when the user manages to sort the worst case database with about 1 million entries in it (when your testers happily sorted their million entry db's with no trouble). one other point - the compile time polymorphism is not specific to FISh (as Ralph Loader mentioned). You write:
The FISh program avoids making a function call at all. That is, instead of passing the addresses to a function, a primitive comparison is made. The key point is that this can only be done because information like the length of the array and the size of the entries (in bytes) can be inferred by the FISh compiler, instead of being supplied by the user.
C++ templates also give the ability to supply a compile time comparision function to sort routines, the code will likewise be specialised on the given comparison and on the size of the entries. One could write sort templates that take the length as well, however since the size of stuff to be sorted typically varies depending on the current application state (i.e. in a manner unknown at compile time) this tends not to be enough of a win to be worth doing. - robbie -- *--------------------------------> + <---------------------------------* |_| robbie gates | pgp key available |_| V category theorist V //cat.maths.usyd.edu.au/~robbie V *--------------------------------> + <---------------------------------*