::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact

Sorting mixed lists of numbers and strings

June 26th, 2009
programming  [html]
Imagine you have this list:
fname_0006.v0_word 2
fname_0007.v0_word 12
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0005.v0_word 24
Imagine further that you want to sort it. Unfortunately, I can't get gnu sort to let me specify which fields are numeric and which now. That is, I can do:
$ cat file.txt | sort
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 24
fname_0005.v0_word 8
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Or I can do:
$ cat file.txt | sort -n
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 24
fname_0005.v0_word 8
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Or I can do:
$ cat file.txt | sort -n -k1,1 -k2,2
fname_0006.v0_word 2
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0005.v0_word 24
You might think this would work:
$ cat file.txt | sort -k1,1 -kn2,2
fname_0006.v0_word 2
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0005.v0_word 24
But nothing seems to make it do the right thing. So I abandoned sort for python:
$ cat simple_sorter.py
import fileinput

def tidy(x):
    try:
        return int(x)
    except ValueError:
        return x

line_bits = []

for line in fileinput.input():
    line_bits.append([tidy(field) for field in line.split()])

for bits in sorted(line_bits):
    print " ".join(str(bit) for bit in bits)

$ cat tmp.txt | python simple_sorter.py
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0005.v0_word 24
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12

Update 2013-08-22: Thinking now, if I had to do it on the terminal I would do:

$ cat file | awk '{print $1, $2+1000}' | sort | awk '{print $1, $2-1000}'
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0005.v0_word 24
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Adding 1000 (or any number with more digits than your biggest number) puts in leading digits, fixing sorting. It's basically decorate-sort-undecorate.

Comment via: facebook

Recent posts on blogs I like:

Assume Nordic Costs: London Edition

A month ago I made maps proposing some subway and regional rail extensions in New York and noting what they would cost if New York could build as cheaply as the Scandinavian capitals. Here is the same concept, but with London rather than New York. Here is…

via Pedestrian Observations June 25, 2019

Instead of “I’m anxious,” try “I feel threatened”

cw: teaching to learn I have a long history with anxiety, and I’m pretty good at noticing when it’s happening. The problem is that I’m always anxious. Noticing anxiety doesn’t snap me out of anxiety– in fact, it often produces meta-anxiety, anxiety about …

via Holly Elmore June 20, 2019

Checkmate on blackmail?

It has been argued that blackmail should be legal if gossip is legal, and even that there are no good consequentialist counterarguments (!). I think this isn’t obvious because the disclosures incentivized by blackmail are systematically worse than gossip.…

via The sideways view June 2, 2019

more     (via openring)

More Posts:


  ::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact