Dear Ralph, thank you for taking an interest in the FISh performance results for quicksort. You raise two basic questions. Is the FISh program faster? If so, why? Is quicksort in FISh faster than qsort (on arrays of doubles)? ==============================================================
(1) Your C code compares using two floating point operations (subtraction, compare), while the fish program uses just one (no subtraction).
This makes a small difference to the overall performance of the algorithm. I re-ran the test for qsort with the following comparison function int comp(const void *i, const void *j) { return (*(double*)i > *(double*)j ) ; } that has only one operation. The C results are better by about 10% but FISh is still twice as fast. C benchmark1 C benchmark2 FISh benchmark 3.59 3.35 1.65
(3) The C library qsort takes a three-valued comparison function, while your fish code uses a two-valued comparison.
I'm not sure what point you're making. The designers of qsort had a free choice. I suspect that the three-valued comparison is used to exploit equality of values, to obtain greater efficiency. In any event, it is unlikely to make much difference in the light of (1) above. Conclusion: The FISh program *is* faster. Why is quicksort in FISh faster? =================================
(4) As you note, your fish code gets optimised by specialising to the datatype and comparison function used.
Yes. This is a given. The question is to pinpoint the source of the speedup.
... This is the main performance issue, and has nothing to do with boxing/unboxing issues.
Point taken. The data supplied to quicksort is not boxed, but this is because (the length of the array and) the size of the array entries are given as explicit parameters. In this sense qsort is *not* polymorphic in its choice of data type. If it were not provided by the C programmer then boxing would add another layer of indirection to the program. Such information about the size of entries etc. is inferred by the FISh compiler.
... Whether [FISh is faster] will depend on things like instruction scheduling in the compiler & CPU, & register allocation in the caller & in any case the difference will be tiny.
Your conclusion does not match the performance figures above. The FISh program avoids making a function call at all. That is, instead of passing the addresses to a function, a primitive comparison is made. The key point is that this can only be done because information like the length of the array and the size of the entries (in bytes) can be inferred by the FISh compiler, instead of being supplied by the user. Yours, Barry Jay ************************************************************************* | Associate Professor C.Barry Jay, | | Reader in Computing Sciences Phone: (61 2) 9514 1814 | | Head, Algorithms and Languages Group, Fax: (61 2) 9514 1807 | | University of Technology, Sydney, e-mail: cbj@socs.uts.edu.au | | P.O. Box 123 Broadway, 2007, http://www-staff.socs.uts.edu.au/~cbj | | Australia. FISh homepage ... ~cbj/FISh | *************************************************************************
Warning: there's no category theory in this letter - it's purely a reponse to the discussion of execution speed of FISh quicksort. Barry Jay wrote:
Why is quicksort in FISh faster? =================================
I would like to propose another reason the FISh quicksort appears to be faster than the library qsort - they aren't the same algorithm. In particular, the standard library qsort and the FISh sort are using different methods to pick the pivot. Choice of pivot in quicksort is important to ensure the worst case doesn't degenerate to O(n^2). The choice of pivot in the FISh code is always the first element, which does give very poor performance in the case the array is already sorted. I'm not sure what the qsort in the libc i have here does. However, typical tricks include selecting the median of three elements at the bottom, middle and top of the array, choosing a random element, choosing the median of three random elements, and so on. Importantly, each of these choices requires some work, and thus negatively impacts on the average speed of the sort in order to alleviate the worst case speed. Caveat: I'm not a professional benchmarker/tester, and the tests below are probably very flawed. I have tried to give details on everything that seems relevant, but your mileage may vary. This data is more given to suggest an alternative, plausible explanation for the observed speed difference in the FISh and C benchmarks, than to claim i have identified all the issues in the timing results found for FISh and C quicksort. To test this hypothesis, i coded up quicksort in C++ using the same pivot picking strategy as the FISh code. I then modified my C++ code, and the C benchmark code and C translated FISh code (CTFC) from the FISh website (by CTFC i mean the contents of test01.c), by altering the random number generator to allow specification of two command line arguments giving the multiplier and step for the seeding algorithm. i.e rather than seed = seed * 25173 + 17431 i used seed = seed * mult + 17431 for varying mult. This allowed me to test the extent to which the algorithm was sensitive to the data being sorted. What follows are timing values on a DGUX AlphaStation 255 (233 MHz) with mult = 1,2,3, ... ,10 (on an array of 10000 values). For each program, i give the total (user + system) time reported by the builtin time function of bash version 2.02.0(1)-release (alpha-dec-osf4.0) (averaged over 10 trials in each case). The time columns are libqsort (= the standard library qsort), cppqsort (= my C++ qsort), fishqsort (= CTFC). They were compiled with gcc, g++, gcc in each case, and -O optimization. NOTE: With these mult values, the data supplied to the qsort is deliberately nonrandom to try and detect the effect proposed in the second paragraph. mult libqsort cppqsort fishqsort 1 .0350 3.462 3.410 2 .0348 4.039 4.039 3 .1003 .0334 .0391 4 .0350 4.085 4.052 5 .1130 .0332 .0393 6 .0356 4.042 4.039 7 .1025 .0341 .0382 8 .0357 4.061 4.054 9 .1088 .0336 .0382 10 .0355 4.053 4.041 Observe that the libqsort has much better average performance (and that its profile is different to the other two - one could probably determine something about the choice of pivots with enough tests ;-). Also, the CTFC at best beats the library by a factor a bit under 3, whereas the library beats the CTFC by a factor of over 110. More importantly, the C++ and fish code run in very similar time, neither consistently beating the other. Also, their profiles are almost identical. I also ran one bigger test - in the case of a 90000 element array consisting of 1 to 90,000 in order, the library took around .3 seconds, whereas the CTFC overflowed the stack and crashed just under 200 seconds into the test (the C++ code took 315 seconds to overflow the stack and crash - i'm not sure what's causing the difference here, but i suspect it has to do with the fact my C++ code used paramaters rather than a static struct). Based on this, I'm proposing that the standard library quicksort is trading off speed for the average cases to bring the worst cases down to something acceptable. This is presumably a serious issue for a library quicksort - its better to run on average somewhat slower in order to get consistency, as this makes applications easier to test for market acceptance of speed, rather than the application apparently hanging when the user manages to sort the worst case database with about 1 million entries in it (when your testers happily sorted their million entry db's with no trouble). one other point - the compile time polymorphism is not specific to FISh (as Ralph Loader mentioned). You write:
The FISh program avoids making a function call at all. That is, instead of passing the addresses to a function, a primitive comparison is made. The key point is that this can only be done because information like the length of the array and the size of the entries (in bytes) can be inferred by the FISh compiler, instead of being supplied by the user.
C++ templates also give the ability to supply a compile time comparision function to sort routines, the code will likewise be specialised on the given comparison and on the size of the entries. One could write sort templates that take the length as well, however since the size of stuff to be sorted typically varies depending on the current application state (i.e. in a manner unknown at compile time) this tends not to be enough of a win to be worth doing. - robbie -- *--------------------------------> + <---------------------------------* |_| robbie gates | pgp key available |_| V category theorist V //cat.maths.usyd.edu.au/~robbie V *--------------------------------> + <---------------------------------*
participants (2)
-
Barry Jay -
Robbie Gates