I am trying to remove duplicates from my Dataframe and save their data into the columns where they are NA/Empty.
Example:
I’ve the following DATAFRAME and I would like to remove all the duplicates in column A but merge the values from the rest of the tables
| A | B | C | D | E |
|---|---|---|---|---|
| 1 | X | |||
| 2 | X | |||
| 2 | X | |||
| 2 | X | |||
| 3 | X | |||
| 3 | X | |||
| 2 | X |
The expected output:
| A | B | C | D | E |
|---|---|---|---|---|
| 1 | X | |||
| 2 | X | X | X | X |
| 3 | X | X |
How can I perform the above dynamically?
Thanks in advance for the answers
>Solution :
You can use groupby_first because it compute the first non-null entry of each column.:
>>> df.groupby('A', as_index=False).first()
A B C D E
0 1 X None None None
1 2 X X X X
2 3 None X X None