Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find all rows between a two specific column values in dataframe python

I have a huge dataframe to work with. I want to output exons only to an output file.
Not just all exons , only exons of mRNA inside a gene block.

I need to write a script in python.

INPUT:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

NC_048323.1     Gnomon  gene    25044   78977   .       +       .       ID=gene                                               -LOC117420859;Dbxref=GeneID:117420859;Name=LOC117420859;gbkey=Gene;gene=LOC1174                                               20859;gene_biotype=protein_coding
NC_048323.1     Gnomon  mRNA    25044   78977   .       +       .       ID=rna-                                               XM_034926345.1;Parent=gene-LOC117420859;Dbxref=GeneID:117420859,Genbank:XM_0349                                               26345.1;Name=XM_034926345.1;gbkey=mRNA;gene=LOC117420859;model_evidence=Support                                               ing evidence includes similarity to: 2 Proteins%2C and 93%25 coverage of the an                                               notated genomic feature by RNAseq alignments;product=coiled-coil domain-contain                                               ing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1     Gnomon  exon    25044   25136   .       +       .       ID=exon                                               -XM_034926345.1-1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_                                               034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing                                                protein 171-like;transcript_id=XM_034926345.1
NC_048323.1     Gnomon  exon    25929   26031   .       +       .       ID=exon                                               -XM_034926345.1-2;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_                                               034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing                                                protein 171-like;transcript_id=XM_034926345.1
 ....
NC_048323.1 Gnomon  CDS 76336   76521   .   +   0   ID=cds-XP_034782236.1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XP_034782236.1;Name=XP_034782236.1;gbkey=CDS;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;protein_id=XP_034782236.1
NC_048323.1 Gnomon  CDS 78960   78977   .   +   0   ID=cds-XP_034782236.1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XP_034782236.1;Name=XP_034782236.1;gbkey=CDS;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;protein_id=XP_034782236.1
NC_048323.1 Gnomon  gene    111664  172479  .   -   .   ID=gene-LOC117421266;Dbxref=GeneID:117421266;Name=LOC117421266;gbkey=Gene;gene=LOC117421266;gene_biotype=protein_coding
NC_048323.1 Gnomon  mRNA    111664  172479  .   -   .   ID=rna-XM_034035429.2;Parent=gene-LOC117421266;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;Name=XM_034035429.2;gbkey=mRNA;gene=LOC117421266;model_evidence=Supporting evidence includes similarity to: 13 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 7 samples with support for all annotated introns;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    172022  172479  .   -   .   ID=exon-XM_034035429.2-1;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    157760  157889  .   -   .   ID=exon-XM_034035429.2-2;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    131303  131497  .   -   .   ID=exon-XM_034035429.2-3;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    125107  125237  .   -   .   ID=exon-XM_034035429.2-4;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    124379  124607  .   -   .   ID=exon-XM_034035429.2-5;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    123710  123872  .   -   .   ID=exon-XM_034035429.2-6;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  CDS 114179  114352  .   -   0   ID=cds-XP_033891320.1;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XP_033891320.1;Name=XP_033891320.1;gbkey=CDS;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like isoform X2;protein_id=XP_033891320.1
NC_048323.1 Gnomon  mRNA    111664  172479  .   -   .   ID=rna-XM_034035428.2;Parent=gene-LOC117421266;Dbxref=GeneID:117421266,Genbank:XM_034035428.2;Name=XM_034035428.2;gbkey=mRNA;gene=LOC117421266;model_evidence=Supporting evidence includes similarity to: 13 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 10 samples with support for all annotated introns;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X1;transcript_id=XM_034035428.2
NC_048323.1 Gnomon  exon    172022  172479  .   -   .   ID=exon-XM_034035428.2-1;Parent=rna-XM_034035428.2;Dbxref=GeneID:117421266,Genbank:XM_034035428.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X1;transcript_id=XM_034035428.2
NC_048323.1 Gnomon  exon    157760  157889  .   -   .   ID=exon-XM_034035428.2-2;Parent=rna-XM_034035428.2;Dbxref=GeneID:117421266,Genbank:XM_034035428.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X1;transcript_id=XM_034035428.2
NC_048323.1 Gnomon  gene    111664  172479  .   -   .   ID=gene-LOC117421266;Dbxref=GeneID:117421266;Name=LOC117421266;gbkey=Gene;gene=LOC117421266;gene_biotype=protein_coding

OUTPUT:output rows

How can I do that?

df.loc[df[‘column_name’] == ‘exon’] – This is not good for me. There are rows in my data frame that are like this:

NC_048323.1 Gnomon  gene    111664  172479  .   -   .   ID=gene-LOC117421266;Dbxref=GeneID:117421266;Name=LOC117421266;gbkey=Gene;gene=LOC117421266;gene_biotype=protein_coding
NC_048323.1 Gnomon  tRNA    111664  172479  .   -   .   ID=rna-XM_034035429.2;Parent=gene-LOC117421266;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;Name=XM_034035429.2;gbkey=mRNA;gene=LOC117421266;model_evidence=Supporting evidence includes similarity to: 13 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 7 samples with support for all annotated introns;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    172022  172479  .   -   .   ID=exon-XM_034035429.2-1;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2
NC_048323.1 Gnomon  exon    157760  157889  .   -   .   ID=exon-XM_034035429.2-2;Parent=rna-XM_034035429.2;Dbxref=GeneID:117421266,Genbank:XM_034035429.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X2;transcript_id=XM_034035429.2

I need exons that are after mRNA only

>Solution :

IIUC, you can use

out = df[(df['column_name'] == 'exon') & (df['column_name'] == 'mRNA').shift()] 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading