Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Taking Substring from Column to grab text until 3rd occurrence of '/'

I have searched everywhere but couldn’t quite find the exact similar solution to my problem.

In Bash, I have a file that is tab delimited. It can potentially have several million lines. In the 27th column there is a string of colors which is delimited by a forward slash. My end goal is for the file’s 27th column to be trimmed such that only the first three colors stay and the rest of the colors in that column are cut out.

Ie

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    column1.    column2.    column 3.    colors
        abc.        abc.         abc.    green/yellow/red/orange/blue 

should become:

    column1.    column2.   column 3.   colors
        abc.        abc.        abc.   green/yellow/red

I’ve been trying to accomplish this using awk, but I’m afraid I just can’t seem to get it to work. Here was what I attempted:

awk 'NR>1 BEGIN{FS=OFS="\t"} {gsub(/^(?:[^\/]*[\/]){2}[^\/]*(.*)/,"",$27); print $0}' ${filename} > "${filename}.tmp" && mv "${filename}.tmp" "${filename}"

I’m vastly unfamiliar with regular expressions, and this is just what I can get to work on a regex builder site, but still not sure if that’s even correct. Again to clarify, I want all the other columns to remain as they are, but I simply want to trim the color column (column number 27) so that only the first 3 colors remain. This file can get huge so I was hoping to keep this in a single command such as awk if possible so that I’m not slowing things down.

>Solution :

Given:

$ cat file
column1.    column2.    column 3.   colors
abc.    abc.    abc.    green/yellow/red/orange/blue

You can do:

awk  'BEGIN{FS=OFS="\t"}
$4~/\//{split($4,a,"/"); $4=a[1] "/" a[2] "/" a[3]} 1' file 

With $4 set to the col you want to change…

Prints:

column1.    column2.    column 3.   colors
abc.    abc.    abc.    green/yellow/red
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading