Taking Substring from Column to grab text until 3rd occurrence of '/'

I have searched everywhere but couldn’t quite find the exact similar solution to my problem.

In Bash, I have a file that is tab delimited. It can potentially have several million lines. In the 27th column there is a string of colors which is delimited by a forward slash. My end goal is for the file’s 27th column to be trimmed such that only the first three colors stay and the rest of the colors in that column are cut out.

Ie

    column1.    column2.    column 3.    colors
        abc.        abc.         abc.    green/yellow/red/orange/blue 

should become:

    column1.    column2.   column 3.   colors
        abc.        abc.        abc.   green/yellow/red

I’ve been trying to accomplish this using awk, but I’m afraid I just can’t seem to get it to work. Here was what I attempted:

awk 'NR>1 BEGIN{FS=OFS="\t"} {gsub(/^(?:[^\/]*[\/]){2}[^\/]*(.*)/,"",$27); print $0}' ${filename} > "${filename}.tmp" && mv "${filename}.tmp" "${filename}"

I’m vastly unfamiliar with regular expressions, and this is just what I can get to work on a regex builder site, but still not sure if that’s even correct. Again to clarify, I want all the other columns to remain as they are, but I simply want to trim the color column (column number 27) so that only the first 3 colors remain. This file can get huge so I was hoping to keep this in a single command such as awk if possible so that I’m not slowing things down.

>Solution :

Given:

$ cat file
column1.    column2.    column 3.   colors
abc.    abc.    abc.    green/yellow/red/orange/blue

You can do:

awk  'BEGIN{FS=OFS="\t"}
$4~/\//{split($4,a,"/"); $4=a[1] "/" a[2] "/" a[3]} 1' file 

With $4 set to the col you want to change…

Prints:

column1.    column2.    column 3.   colors
abc.    abc.    abc.    green/yellow/red

Leave a Reply