Two Plus Two Newer Archives - View Single Post

Wyman · #13 11-18-2007, 08:05 PM

[ QUOTE ]
[ QUOTE ]
"Fastest" time-complexity wise with lots of numbers and bound range would be Radix Sort, assuming no previous knowledge of data being sorted (which allows more special case sorting).

[/ QUOTE ]

Wyman,

I read this quote from the radix wikipedia entry:

When an LSD radix sort processes keys which all have the same fixed length, then the upper bound of the execution time is O(n), where n is the number of keys to be sorted. When processing fixed-length keys, some implementations of LSD radix sorts are slower than O(n · log n) comparison-based sorting algorithms, such as heap sort, unless a sufficiently large amount of input data is processed. What a sufficiently large amount of input data precisely is will vary from computer system to computer system and from implementation to implementation.

It is not clear to me what conditions make the radix better than quicksort or heapsort exactly. Could you explain this?

[/ QUOTE ]

Sorry I sort of forgot about this thread.

The idea is (roughly) that an algorithm is O(f(n)) if, as n gets large, an input of size n (in this case, n numbers to be sorted) takes at most C*f(n) time to run (where C is a constant that does not depend on n, and where "time" is usually measured by slowest operations -- perhaps number of comparisons, for instance). You may want to reread this a few times.

Example: I have a list of N numbers. To find the largest element in the list, I can just scroll through the list once. I save the largest-so-far, and I compare each new element to the largest-so-far. This takes O(1) space (the amount of space required is independent of N), and O(N) time (the number of comparisons is at worst linear in N).

Now, as it turns out, if the *size* of the numbers on your list is bounded (e.g. all numbers are < 99999), then a radix sort is O(n), where n is the size of the list to sort. Quicksort and heapsort are O(n log n).

The catch is that buried in the definition of "O", I mentioned that "C". This means that an algorithm that achieves its goal in 10000*N comparisons is O(N). This, of course, may be slower than an algorithm that runs in 3*N log N (which is O(N log N) ) if N is not sufficiently large.

So while an O(N) algorithm is *asymptotically* faster than an O(N log N) algorithm, it's not the case that for every input the O(N) algorithm will run faster than the O(N log N) algorithm.

Aside: If you don't have an upper bound on the size of your numbers, the (provably) fastest sorting algorithms are O(n log n). For the reasons above, most sorting algorithms that are built into software use quicksort (O(n log n)) when n > 7, and something like bubble sort or insertion sort (O(n^2)) when n <= 7. Even though the latter algorithms are slower asymptotically, for small enough instances, they are pretty fast.