Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter rows with two conditions In Dataframe in Pandas

I have dataframe:

import pandas as pd
data = {'text': ['I ran home', 'I went home', 'I looked at the cat', 'The cat looked at me'],
       'word_count':[3,3,4,5]}
        
df = pd.DataFrame(data)
df['len_text'] = df["text"].str.len()
    text             word_count len
0   I ran home              3   10
1   I went home             3   11
2   I looked at the cat     4   19
3   The cat looked at me    5   20

I want to filter rows with two conditions:
if the values in the word_count column are the same and if the value in the len_text column is greater than the next row, then leave the greater value.

So result will be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    text             word_count len
0   I went home             3   11
1   I looked at the cat     4   19
2   The cat looked at me    5   20

I tried to do this but it doesn’t work:

for i, row in df.iterrows():
    if (df['pub_count'][i] == df['pub_count'][i+1])&(df['len'][i] >= df['df'][i+1]):
        df = df.drop(i+1)

>Solution :

You can create groups by consecutive values in word_count and get indices by DataFrameGroupBy.idxmax, last select only these rows by DataFrame.loc:

g = df['word_count'].ne(df['word_count'].shift()).cumsum()
df = df.loc[df.groupby(g)['len_text'].idxmax()]
print (df)
                   text  word_count  len_text
1           I went home           3        11
2   I looked at the cat           4        19
3  The cat looked at me           5        20

Consecutive groups means if again e.g. group 3 are count separately:

data = {'text': ['I ran home', 'I went home', 'I looked at the cat',
                 'The cat looked at me','I ran home', 'I went homes'],
       'word_count':[3,3,4,5,3,3]}
        
df = pd.DataFrame(data)
df['len_text'] = df["text"].str.len()
print (df)
                   text  word_count  len_text
0            I ran home           3        10
1           I went home           3        11
2   I looked at the cat           4        19
3  The cat looked at me           5        20
4            I ran home           3        10
5          I went homes           3        12

g = df['word_count'].ne(df['word_count'].shift()).cumsum()
df1 = df.loc[df.groupby(g)['len_text'].idxmax()]
print (df1)
                   text  word_count  len_text
1           I went home           3        11
2   I looked at the cat           4        19
3  The cat looked at me           5        20
5          I went homes           3        12

vs.

df2 = df.loc[df.groupby('word_count')['len_text'].idxmax()]
print (df2)
                   text  word_count  len_text
5          I went homes           3        12
2   I looked at the cat           4        19
3  The cat looked at me           5        20
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading