📜 ⬆️ ⬇️

Sorting a million 32-bit ints in 2 megabytes of memory on Python

My translation of the article by Guido van Rossum:

I was jokingly asked if I could sort out a million 32-bit ints in 2 megabytes of memory on Python. While thinking, it occurred to me to use an I / O mechanism using buffer memory.

In general, this is a comic question - the data alone will take 4 megabytes, subject to a binary representation! True, you can go to the trick - take a file containing a million 32-bit int'ov. How to sort them using the minimum amount of memory? It must be some sort of merge sort, in which small pieces of data are sorted and written to a temporary file, after which the temporary files are merged to produce the final result.
')
Here is my solution:

Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  1. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  2. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  3. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  4. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  5. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  6. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  7. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  8. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  9. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  10. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  11. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  12. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  13. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  14. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  15. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  16. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  17. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  18. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  19. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  20. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  21. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  22. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  23. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  24. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  25. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  26. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  27. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  28. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)
  29. Copy Source | Copy HTML #!/usr/bin/env python3.0 import sys, array, tempfile, heapq assert array . array ( 'i' ).itemsize == 4 def intsfromfile (f): while True: a = array . array ( 'i' ) a.fromstring(f.read( 4000 )) if not a: break for x in a: yield x iters = [] while True: a = array . array ( 'i' ) a.fromstring( sys .stdin.buffer.read( 40000 )) if not a: break f = tempfile .TemporaryFile() array . array ( 'i' , sorted (a)).tofile(f) f.seek( 0 ) iters.append( intsfromfile (f)) a = array . array ( 'i' ) for x in heapq .merge(*iters): a.append(x) if len (a) >= 1000 : a.tofile( sys .stdout.buffer) del a[:] if a: a.tofile( sys .stdout.buffer)

On my 3-year-old Linux-based personal, the execution took 5.4 seconds when reading data from an input file containing exactly a million random 32-bit ints. This is not so bad, considering that when reading data directly from memory, sorting takes 2 seconds:

Copy Source | Copy HTML
  1. #! / usr / bin / env python3.0
  2. import sys, array
  3. a = array . array ( 'i' , sys .stdin.buffer.read ())
  4. a = list (a)
  5. a.sort ()
  6. a = array . array ( 'i' , a)
  7. a.tofile ( sys .stdout.buffer)

Let's return to our sorting. The first 3 lines are obvious:

Copy Source | Copy HTML
  1. #! / usr / bin / env python3.0
  2. import sys, array, tempfile, heapq
  3. assert array . array ( 'i' ) .itemsize == 4

The first line indicates that we are using Python 3.0. The second line imports the required modules. The third line causes an interrupt on 64-bit systems where int is not 32-bit — I was not tasked with writing portable code.

Then we declared a function that reads the ints from the file:

Copy Source | Copy HTML
  1. def intsfromfile (f):
  2. while true:
  3. a = array . array ( 'i' )
  4. a.fromstring (f.read ( 4000 ))
  5. if not a:
  6. break
  7. for x in a:
  8. yield x

This is the very place where the optimization of the algorithm takes place: it reads up to 1000 ints at a time, giving them out one by one. At first, I did not use buffering - the function simply read 4 bytes from the file, converted it to an integer, and returned it. But it worked 4 times slower! Please note that we cannot use a.fromfile(f, 1000) , because the fromfile() method will return an error if there is not enough data in the file, but I wanted the code to adapt to any number of ints in the file.

Next comes the input cycle. It cyclically reads a piece of 10'000 ints from the input file, sorts them into memory and writes it into a temporary file. Then we will add an iterator for the temporary file to the list of iterators we need in the final stage using the intsfromfile () function described above.

Copy Source | Copy HTML
  1. iters = []
  2. while true:
  3. a = array . array ( 'i' )
  4. a.fromstring ( sys .stdin.buffer.read ( 40000 ))
  5. if not a:
  6. break
  7. f = tempfile .TemporaryFile ()
  8. array . array ( 'i' , sorted (a)). tofile (f)
  9. f.seek ( 0 )
  10. iters.append (intsfromfile (f))

Accordingly, 100 temporary files of 10,000 values ​​each will be created for the input million ints.

Finally, we merge all these files (each already sorted) together. The heapq module contains a remarkable function for solving this problem: heapq.merge(iter1, iter2, ... ) , which returns an iterator that passes input parameters in a sequence in which each input parameter itself passes its value in the correct order (as in this case).

Copy Source | Copy HTML
  1. a = array . array ( 'i' )
  2. for x in heapq .merge (* iters):
  3. a.append (x)
  4. if len (a)> = 1000 :
  5. a.tofile ( sys .stdout.buffer)
  6. del a [:]
  7. if a:
  8. a.tofile ( sys .stdout.buffer)

This is another case in which buffered input / output becomes necessary: ​​writing each individual value to a file, as soon as it becomes available, reduces the rate of operation of the algorithm by half. Using input and output buffering, we managed to increase the execution speed by 10 times!

Source: https://habr.com/ru/post/65503/


All Articles