Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter pandas dataframe rows based on multiple conditions

This is my main dataframe that I want to filter.

    first.seqnames  first.start first.end   first.width first.strand    second.seqnames second.start    second.end  second.width    second.strand
126457  chr1    10590184    10590618    GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...   *   chr1    10730773    10731207    GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...   *
126461  chr1    10590958    10591541    CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...   *   chr1    10731548    10732131    CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...   *
126544  chr1    10597414    10597918    ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...   *   chr1    10738018    10738522    ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...   *
126576  chr1    10600437    10600904    CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT...   *   chr1    10741045    10741512    CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT...   *
131172  chr1    11082133    11082593    TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT...   *   chr1    11245253    11245713    TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT...   *

This is my conditional dataframe based on which I will filter:

    Chrom   Start   End
0   chr1    10590184    10590618
1   chr1    10590958    10591541
2   chr1    10597414    10597918

I’ve tried the following logic to filter each row. But it’s wrong; it is not comparing each row.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

header_frame[header_frame['first.end'].isin(knee_df['End']) & header_frame['first.start'].isin(knee_df['Start'])]

I want only those rows in the 1st dataframe which exist in the 2nd dataframe.

>Solution :

Assuming df1 and df2 the two dataframes, you can inner merge:

df1.merge(df2,
          left_on=['first.seqnames', 'first.start', 'first.end'],
          right_on=['Chrom', 'Start', 'End'],
          how='inner'
         )[df1.columns]

output:

  first.seqnames  first.start  first.end                                        first.width first.strand second.seqnames  second.start  second.end                                       second.width second.strand
0           chr1     10590184   10590618  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...            *            chr1      10730773    10731207  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...             *
1           chr1     10590958   10591541  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...            *            chr1      10731548    10732131  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...             *
2           chr1     10597414   10597918  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...            *            chr1      10738018    10738522  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...             *
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading