Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does the part I want printed out comes out with the last line duplicated in perl?

I have a uniprot document with a protein sequence as well as some metadata. I need to use perl to match the sequence and print it out but for some reason the last line always comes out two times. The code I wrote is here

#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {

if($_=~m /^\s+(\D+)/) {   #this is the pattern I used to match the sequence in the document
  $seq=$1;
  $seq=~s/\s//g;}         #removing the spaces from the sequence

  print $seq;  
}

I instead tried $seq.=$1; but it printed out the sequence 4.5 times. Im sure i have made a mistake here but not sure what. Here is the input file https://www.uniprot.org/uniprot/P30988.txt

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here is your code reformatted and extra whitespace added between operators to make it clearer what scope the statements are running in.

#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {

    if ($_ =~ m /^\s+(\D+)/) {   
        $seq = $1;
        $seq =~ s/\s//g;
    }   

    print $seq;  
}

The placement of the print command means that $seq will be printed for every line from the input file — even those that don’t match the regex.

I suspect you want this

#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {

    if ($_ =~ m /^\s+(\D+)/) {   
        $seq = $1;
        $seq =~ s/\s//g;

        # only print $seq for lines that match with /^\s+(\D+)/
        # Also - added a newline to make it easier to debug

        print $seq . "\n";
    } 
}

When I run that I get this

MRFTFTSRCLALFLLLNHPTPILPAFSNQTYPTIEPKPFLYVVGRKKMMDAQYKCYDRMQ 
QLPAYQGEGPYCNRTWDGWLCWDDTPAGVLSYQFCPDYFPDFDPSEKVTKYCDEKGVWFK 
HPENNRTWSNYTMCNAFTPEKLKNAYVLYYLAIVGHSLSIFTLVISLGIFVFFRSLGCQR 
VTLHKNMFLTYILNSMIIIIHLVEVVPNGELVRRDPVSCKILHFFHQYMMACNYFWMLCE 
GIYLHTLIVVAVFTEKQRLRWYYLLGWGFPLVPTTIHAITRAVYFNDNCWLSVETHLLYI 
IHGPVMAALVVNFFFLLNIVRVLVTKMRETHEAESHMYLKAVKATMILVPLLGIQFVVFP 
WRPSNKMLGKIYDYVMHSLIHFQGFFVATIYCFCNNEVQTTVKRQWAQFKIQWNQRWGRR 
PSNRSARAAAAAAEAGDIPIYICHQELRNEPANNQGEESAEIIPLNIIEQESSA 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading