Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filtering out rows based on other rows using pandas

I have a dataframe that looks like this:

dict = {'companyId': {0: 198236, 1: 198236, 2: 900814, 3: 153421, 4: 153421, 5: 337815},
 'region': {0: 'Europe', 1: 'Europe', 2: 'Asia-Pacific', 3: 'North America', 4: 'North America', 5:'Africa'},
 'value': {0: 560, 1: 771, 2: 964, 3: 217, 4: 433, 5: 680},
 'type': {0: 'actual', 1: 'forecast', 2: 'actual', 3: 'forecast', 4: 'actual', 5: 'forecast'}}

df = pd.DataFrame(dict)

    companyId     region          value  type
0   198236        Europe          560    actual
1   198236        Europe          771    forecast
2   900814        Asia-Pacific    964    actual
3   153421        North America   217    forecast
4   153421        North America   433    actual
5   337815        Africa          680    forecast

I can’t seem to figure out a way to filter out certain rows based on the following condition:

If there are two entries under the same companyId, as is the case for 198236 and 153421, I want to keep only the entry where type is actual.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If there is only one entry under a companyId, as is the case for 337815 and 900814, I want to keep that row, irrespective of the value in column type.

Does anyone have an idea how to go about this?

>Solution :

You can use a groupby and transform to create boolean indexing:

#Your condition i.e. retain the rows which are not duplicated and those
# which are duplicated but only type==actual. Lets express that as a lambda.
to_filter = lambda x: (len(x) == 1) | ((len(x) > 1) & (x == 'actual'))

#then create a boolean indexing mask as below
m = df.groupby('companyId')['type'].transform(to_filter)


#then filter your df with that m:
df[m]:

   companyId         region  value      type
0     198236         Europe    560    actual
2     900814   Asia-Pacific    964    actual
4     153421  North America    433    actual
5     337815         Africa    680  forecast
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading