The real proof that the GPU can sort the data (the radix algorithm) is several times faster than the CPU.
Duane Merrill and Andrew Grimshaw from the Computer Engineering Department at the University of Virginia at Charlottesville published their free SRTS Radix Sort sorting method, in which the GTX 480 shows a sorting speed of over 1 billion 32-bit keys per second: approximately four times faster than the Core i7 processor.
The method is suitable for any CUDA-devices. The current version supports sorting of any built-in numeric C / C ++ data types (for example, signed char, float, unsigned long long), as well as automatic optimization in cases where all keys have the same length (sorting acceleration is five times).