Home Why does the part I want printed out comes out with the last line duplicated in perl?

Questions

Why does the part I want printed out comes out with the last line duplicated in perl?

November 30, 2022

I have a uniprot document with a protein sequence as well as some metadata. I need to use perl to match the sequence and print it out but for some reason the last line always comes out two times. The code I wrote is here

#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {

if($_=~m /^\s+(\D+)/) {   #this is the pattern I used to match the sequence in the document
  $seq=$1;
  $seq=~s/\s//g;}         #removing the spaces from the sequence

  print $seq;  
}

I instead tried $seq.=$1; but it printed out the sequence 4.5 times. Im sure i have made a mistake here but not sure what. Here is the input file https://www.uniprot.org/uniprot/P30988.txt

>Solution :

Here is your code reformatted and extra whitespace added between operators to make it clearer what scope the statements are running in.

#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {

    if ($_ =~ m /^\s+(\D+)/) {   
        $seq = $1;
        $seq =~ s/\s//g;
    }   

    print $seq;  
}

The placement of the print command means that $seq will be printed for every line from the input file — even those that don’t match the regex.

I suspect you want this

#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {

    if ($_ =~ m /^\s+(\D+)/) {   
        $seq = $1;
        $seq =~ s/\s//g;

        # only print $seq for lines that match with /^\s+(\D+)/
        # Also - added a newline to make it easier to debug

        print $seq . "\n";
    } 
}

When I run that I get this

MRFTFTSRCLALFLLLNHPTPILPAFSNQTYPTIEPKPFLYVVGRKKMMDAQYKCYDRMQ 
QLPAYQGEGPYCNRTWDGWLCWDDTPAGVLSYQFCPDYFPDFDPSEKVTKYCDEKGVWFK 
HPENNRTWSNYTMCNAFTPEKLKNAYVLYYLAIVGHSLSIFTLVISLGIFVFFRSLGCQR 
VTLHKNMFLTYILNSMIIIIHLVEVVPNGELVRRDPVSCKILHFFHQYMMACNYFWMLCE 
GIYLHTLIVVAVFTEKQRLRWYYLLGWGFPLVPTTIHAITRAVYFNDNCWLSVETHLLYI 
IHGPVMAALVVNFFFLLNIVRVLVTKMRETHEAESHMYLKAVKATMILVPLLGIQFVVFP 
WRPSNKMLGKIYDYVMHSLIHFQGFFVATIYCFCNNEVQTTVKRQWAQFKIQWNQRWGRR 
PSNRSARAAAAAAEAGDIPIYICHQELRNEPANNQGEESAEIIPLNIIEQESSA