Does Sort Really Fall Back to Disk? |
May 27th, 2025 |
tech |
sort
command is clever: to sort very large
files it does a series of in-memory sorts, saving sorted chunks to
temporary files, and then does a merge sort on those chunks. Except
this often doesn't work anymore.
Here's what I see if I run man sort
and look at the
documentation for --buffer-size
:
use SIZE for main memory buffer
That's pretty terse! What does my Mac say?
Use size for the maximum size of the memory buffer. Size modifiers %,b,K,M,G,T,P,E,Z,Y can be used. If a memory limit is not explicitly specified, sort takes up to about 90% of available memory. If the file size is too big to fit into the memory buffer, the temporary disk files are used to perform the sorting.
Makes sense! But then the docs for --temporary-directory
say:
use DIR for temporaries, not$TMPDIR
or/tmp
; multiple options specify multiple directories
And these days /tmp
is often memory-backed, via tmpfs. This changed in
Fedora
18 (2013) and Ubuntu
24.10 (2024), and is changing in Debian 13
(in a month or two).
It seems to me that these days it would be better for
--temporary-directory
to default to
/var/tmp
, which is preserved
across reboots and so will generally be backed by disk even on
systems that use tmpfs for /tmp
. In the meantime,
sort --temporary-directory /var/tmp
will do the trick.
Comment via: facebook, lesswrong, mastodon, bluesky, substack