Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to join columns in a pandas dataframe, with empty values, considering priority?

I’m doing a university job with a pandas dataframe, as below:

      import numpy as np
      import pandas as pd

      df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

      print(df)

      col1    sensor_name   col2    col3
      NaN      water         NaN    NaN
      11.2     strain        NaN    2.0
      10.0     fog           30.0   30.0
      NaN      water         40.0   44.0
      1000.0   fog           NaN    NaN

I would like to join columns 1, 2 and 3, to avoid the NaN values.
The priority would be the value in col1, if any. Then the priority would be ‘col2’ and finally ‘col3’.

I tried to do the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      df['new_column'] = df.ffill(axis=1)['col3']

The output is:

      col1     sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       water
       11.2     strain          NaN     2.0       2.0
       10.0     fog             30.0    30.0      30.0
       NaN      water           40.0    44.0      44.0
       NaN      fog             NaN     NaN       fog

However, the desired output is:

       col1    sensor_name      col2    col3    new_column
       NaN      water           NaN     NaN       NaN
       11.2     strain          NaN     2.0       11.2
       10.0     fog             30.0    30.0      10.0
       NaN      water           40.0    44.0      40.0
       1000.0   fog             NaN     NaN       1000.0

>Solution :

One of the many ways you can achieve is using pandas.apply function.

import numpy as np
import pandas as pd

df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
                         'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
                         'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
                         'col3': [np.NaN, 2, 30, 44, np.NaN]
                         })

print(df)

def apply_func(row):
  if not pd.isna(row['col1']):
    return row['col1']
  elif not pd.isna(row['col2']):
    return row['col2']
  return row["col3"]

df["new_cols"]=df.apply(apply_func,axis=1)
print(df)

Output:

     col1 sensor_name  col2  col3  new_cols
0     NaN       water   NaN   NaN       NaN
1    11.2      strain   NaN   2.0      11.2
2    10.0         fog  30.0  30.0      10.0
3     NaN       water  40.0  44.0      40.0
4  1000.0         fog   NaN   NaN    1000.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading