Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

sum dataframe based on values from another dataframe

I have two dataframes, lets called them A & B, which are basically matrices. Both are the same shape, 100 x 350.

Dataframe A has numerical values & dataframe B contains only boolean values.

What I want to do is sum the columns in dataframe A but only where the corresponding element cell in dataframe B is True. Please see the example below.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 Dataframe A              Dataframe B
 'ad'  'bc'  'de'         'ad'  'bc'  'de'
  2     3     6            True  False True
  1     1     3            True  True  True
  4     7     2            False True  True

desired output

 'ad'  'bc'  'de'
  3     8     11

I am currently looping through each column and indexing in & then summing. I imagine there are better ways though to do this?

>Solution :

As you are aggregating, it would make more sense to output a Series.

You just need to mask the False with where and aggregate the rows per column with sum

out = df1.where(df2).sum().astype(int)

output:

'ad'     3
'bc'     8
'de'    11
dtype: int64

If you really need a DataFrame:

df1.where(df2).sum().astype(int).to_frame().T

output:

   'ad'  'bc'  'de'
0     3     8    11
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading