Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I remove numbers after a specific character with sed?

I need to modify a .fasta file that looks like this:

>Contig_1;2
AGATC...
>Contig_2;345
AaGGC...
>Contig_3;22
GGAGA...

And transform it into something like:

>Contig_1
AGATC...
>Contig_2
AaGGC...
>Contig_3
GGAGA...

I tried doing the following, but it did not work as intended.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sed -i 's/;*\n/\n/g' file.fasta

Could someone give me some advice? Thanks!

>Solution :

You can use

sed -i 's/;[^;]*$//' file.fasta

See the online demo:

s='>Contig_1;2
AGATC...
>Contig_2;345
AaGGC...
>Contig_3;22
GGAGA...'
sed 's/;[^;]*$//' <<< "$s"

Output:

>Contig_1
AGATC...
>Contig_2
AaGGC...
>Contig_3
GGAGA...

Note that sed does not place the newline into the pattern space (since you are using a GNU sed, you could force it to do so with -z, but it is not necessary here), and you can’t match a newline with \n in your sed command.

The ;[^;]*$ pattern matches

  • ; – a semi-colon
  • [^;]* – any zero or more chars other than ; (if you need to make sure you match digits, replace with [0-9]* or [[:digit:]]*)
  • $ – end of string.

Note you need no g flag here since this command needs to perform a single replacement per line.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading