Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas or R – Merge Rows By Same Value in Column Over NaN values – Look at Example

I have a very specific dataset it looks something like this:

record_id event_id instrument repeat_inst
PI0005 v03_abc_1 NaN 1
PI0005 v03_abc_1 i_sensor NaN
PI0005 v03_abc_1 NaN NaN
PI0005 v02_abc_33 i_sensor NaN
PI0005 v02_abc_33 NaN NaN
PI0006 v02_abc_1 i_sensor 1
PI0006 v02_abc_1 NaN NaN

How do I make it look like this:

record_id event_id instrument repeat_inst
PI0005 v03_abc_1 i_sensor 1
PI0005 v02_abc_33 i_sensor NaN
PI0006 v02_abc_2 i_sensor 1

Where rows with the same record_id and event_id get merged together, where NaN values are replaced with the other value, and if both values are NaN, then NaN can be kept (like in the forth and fifth row in the original dataframe).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Assume that only one of the related cells have a value and all others have NaN.

This should apply to all columns of the data, there are thousands of columns and rows.

I tried using group by, but don’t know how to continue.

>Solution :

With R

library(dplyr)
df1 %>%
   group_by(record_id, event_id) %>%
   summarise(across(everything(),  ~.x[!is.na(.x)][1]))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading