Does Sort Really Fall Back to Disk?

May 27th, 2025
tech
The unix sort command is clever: to sort very large files it does a series of in-memory sorts, saving sorted chunks to temporary files, and then does a merge sort on those chunks. Except this often doesn't work anymore.

Here's what I see if I run man sort and look at the documentation for --buffer-size:

use SIZE for main memory buffer

That's pretty terse! What does my Mac say?

Use size for the maximum size of the memory buffer. Size modifiers %,b,K,M,G,T,P,E,Z,Y can be used. If a memory limit is not explicitly specified, sort takes up to about 90% of available memory. If the file size is too big to fit into the memory buffer, the temporary disk files are used to perform the sorting.

Makes sense! But then the docs for --temporary-directory say:

use DIR for temporaries, not $TMPDIR or /tmp; multiple options specify multiple directories

And these days /tmp is often memory-backed, via tmpfs. This changed in Fedora 18 (2013) and Ubuntu 24.10 (2024), and is changing in Debian 13 (in a month or two).

It seems to me that these days it would be better for --temporary-directory to default to /var/tmp, which is preserved across reboots and so will generally be backed by disk even on systems that use tmpfs for /tmp. In the meantime, sort --temporary-directory /var/tmp will do the trick.

Comment via: facebook, lesswrong, mastodon, bluesky, substack

Recent posts on blogs I like:

In Defense Of Cultural Relativism

Anthropological methods are bad ethics

via Thing of Things May 28, 2025

Workshop House case study

Lauren Hoffman interviewed me about Workshop House and wrote this post about a community I’m working on building in DC.

via Home April 30, 2025

Impact, agency, and taste

understand + work backwards from the root goal • don’t rely too much on permission or encouragement • make success inevitable • find your angle • think real hard • reflect on your thinking

via benkuhn.net April 19, 2025

more     (via openring)