Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove last character from a line only if it's a number in R or Linux

I have a list of ~28,000 gene transcripts, e.g.:

4R79.1b
4R79.2b
AC3.1a
AC3.2
AC3.3
AC3.5a

I need to get gene names by removing the last character only if it’s a letter. I’ve been googling for days and haven’t found a solution that would remotely help, I must have missed something.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I thought there must be a simple solution but my best attempt was sed ‘s/[[:alpha:]]$//’ transcripts.txt > genes.txt but it did not seem to do anything and the size of the file has not changed from the original.

Please help??

Thank you so much.

>Solution :

With awk:

$ echo '4R79.1b 4R79.2b AC3.1a AC3.2 AC3.3 AC3.5a' | 
awk '{for(i=1;i<=NF;i++) sub(/[[:alpha:]]$/,"",$i)} 1'   

Prints:

4R79.1 4R79.2 AC3.1 AC3.2 AC3.3 AC3.5 

Or sed:

sed -E 's/[[:alpha:]]([[:space:]]|$)/\1/g'

For a new file, just redirect:

sed -E 's/[[:alpha:]]([[:space:]]|$)/\1/g' file > new_file

If you want to replace inplace you can use sed:

sed -i bak -E 's/[[:alpha:]]([[:space:]]|$)/\1/g' file

Or awk by redirecting to a new temp file then overwriting the original (which is what sed -i is doing…):

awk '{for(i=1;i<=NF;i++) sub(/[[:alpha:]]$/,"",$i)} 1' file > TEMP_FILE && mv -f TEMP_FILE file

You can also use GNU awk which has an inplace option as well.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading