## AWK — How to do selective multiple column sorting?

Question!

In awk, how can I do this:

Input:

``````1  a  f  1  12  v
2  b  g  2  10  w
3  c  h  3  19  x
4  d  i  4  15  y
5  e  j  5  11  z
``````

Desired output, by sorting numerical value at `\$5`:

``````1  a  f  2  10  w
2  b  g  5  11  z
3  c  h  1  12  v
4  d  i  4  15  y
5  e  j  3  19  x
``````

Note that the sorting should only affecting `\$4`, `\$5`, and `\$6` (based on value of `\$5`), in which the previous part of table remains intact.

By : nawesita

Personally, I find using `awk` to safely sort arrays of columns rather tricky because often you will need to hold and sort on duplicate keys. If you need to selectively sort a group of columns, I would call `paste` for some assistance:

``````paste -d ' ' <(awk '{ print \$1, \$2, \$3 }' file.txt) <(awk '{ print \$4, \$5, \$6 | "sort -k 2" }' file.txt)
``````

Results:

``````1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
``````
By : Steve

This can be done in pure `awk`, but as @steve said, it's not ideal. `gawk` has limited sort functions, and `awk` has no built-in sort at all. That said, here's a (rather hackish) solution using a compare function in `gawk`:

``````[[email protected] ~/tmp3]\$ cat text
1  a  f  1  12  v
2  b  g  2  10  w
3  c  h  3  19  x
4  d  i  4  15  y
5  e  j  5  11  z
[[email protected] ~/tmp3]\$ cat doit.gawk
### Function to be called by asort().
function cmp(i1,v1,i2,v2) {
split(v1,a1); split(v2,a2);
if (a1[2]>a2[2])      { return 1; }
else if (a1[2]<a2[2]) { return -1; }
else                  { return 0; }
}

### Left-hand-side and right-hand-side, are sorted differently.
{
lhs[NR]=sprintf("%s %s %s",\$1,\$2,\$3);
rhs[NR]=sprintf("%s %s %s",\$4,\$5,\$6);
}

END {
asort(rhs,sorted,"cmp");    ### This calls the function we defined, above.
for (i=1;i<=NR;i++) {       ### Step through the arrays and reassemble.
printf("%s %s\n",lhs[i],sorted[i]);
}
}
[[email protected] ~/tmp3]\$ gawk -f doit.gawk text
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
[[email protected] ~/tmp3]\$
``````

This keeps your entire input file in arrays, so that lines can be reassembled after the sort. If your input is millions of lines, this may be problematic.

Note that you might want to play with the `printf` and `sprintf` functions to set appropriate output field separators.

You can find documentation on using `asort()` with functions in the gawk man page; look for `PROCINFO["sorted_in"]`.

By : ghoti