::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact

Sorting mixed lists of numbers and strings

June 26th, 2009
programming  [html]
Imagine you have this list:
fname_0006.v0_word 2
fname_0007.v0_word 12
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0005.v0_word 24
Imagine further that you want to sort it. Unfortunately, I can't get gnu sort to let me specify which fields are numeric and which now. That is, I can do:
$ cat file.txt | sort
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 24
fname_0005.v0_word 8
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Or I can do:
$ cat file.txt | sort -n
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 24
fname_0005.v0_word 8
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Or I can do:
$ cat file.txt | sort -n -k1,1 -k2,2
fname_0006.v0_word 2
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0005.v0_word 24
You might think this would work:
$ cat file.txt | sort -k1,1 -kn2,2
fname_0006.v0_word 2
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0005.v0_word 24
But nothing seems to make it do the right thing. So I abandoned sort for python:
$ cat simple_sorter.py
import fileinput

def tidy(x):
    try:
        return int(x)
    except ValueError:
        return x

line_bits = []

for line in fileinput.input():
    line_bits.append([tidy(field) for field in line.split()])

for bits in sorted(line_bits):
    print " ".join(str(bit) for bit in bits)

$ cat tmp.txt | python simple_sorter.py
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0005.v0_word 24
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12

Update 2013-08-22: Thinking now, if I had to do it on the terminal I would do:

$ cat file | awk '{print $1, $2+1000}' | sort | awk '{print $1, $2-1000}'
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0005.v0_word 24
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Adding 1000 (or any number with more digits than your biggest number) puts in leading digits, fixing sorting. It's basically decorate-sort-undecorate.

Comment via: facebook

More Posts:


  ::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact