Sorting mixed lists of numbers and strings |
June 26th, 2009 |
| programming, tech |
fname_0006.v0_word 2 fname_0007.v0_word 12 fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0005.v0_word 24Imagine further that you want to sort it. Unfortunately, I can't get gnu
sort to let me specify which fields are numeric
and which now. That is, I can do:
$ cat file.txt | sort fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 24 fname_0005.v0_word 8 fname_0006.v0_word 2 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12Or I can do:
$ cat file.txt | sort -n fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 24 fname_0005.v0_word 8 fname_0006.v0_word 2 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12Or I can do:
$ cat file.txt | sort -n -k1,1 -k2,2 fname_0006.v0_word 2 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12 fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0005.v0_word 24You might think this would work:
$ cat file.txt | sort -k1,1 -kn2,2 fname_0006.v0_word 2 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12 fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0005.v0_word 24But nothing seems to make it do the right thing. So I abandoned sort for python:
$ cat simple_sorter.py
import fileinput
def tidy(x):
try:
return int(x)
except ValueError:
return x
line_bits = []
for line in fileinput.input():
line_bits.append([tidy(field) for field in line.split()])
for bits in sorted(line_bits):
print " ".join(str(bit) for bit in bits)
$ cat tmp.txt | python simple_sorter.py
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0005.v0_word 24
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Update 2013-08-22: Thinking now, if I had to do it on the terminal I would do:
$ cat file | awk '{print $1, $2+1000}' | sort | awk '{print $1, $2-1000}'
fname_0001.v0_word 15
fname_0002.v0_word 23
fname_0003.v0_word 5
fname_0003.v0_word 7
fname_0005.v0_word 8
fname_0005.v0_word 24
fname_0006.v0_word 2
fname_0006.v0_word 9
fname_0007.v0_word 11
fname_0007.v0_word 12
Adding 1000 (or any number with more digits than your biggest number)
puts in leading digits, fixing sorting. It's basically decorate-sort-undecorate.
Comment via: facebook, substack