Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

create new column among multiple columns value

I have a question about how to create a new column based on several column value.

Input:

col1 col2 col3 
1 1 1 
NULL NULL NULL 
2 NULL 2 
NULL NULL 3 
4 NULL NULL 
5 5 NULL 

Output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

col1 col2 col3 new
1 1 1 1
NULL NULL NULL NULL
2 NULL 2 2
NULL NULL 3 3
4 NULL NULL 4
5 5 NULL 5

I am trying to use combine_first, but it seems not be a good choice since I have multiple columns need to combine.

>Solution :

One option is to rename the columns to have the same name; then use groupby + first:

df['new'] = (df.rename(columns={col: 'col' for col in df.columns})
             .groupby(level=0, axis=1).first())

You could also iteratively use combine_first:

df['new'] = float('nan')
for c in df.columns:
    df['new'] = df['new'].combine_first(df[c])

Or you could apply a lambda that selects non-NaN values row-wise (works for Python>=3.8 since it uses the walrus operator; could write the same function differently if you have Python<3.8):

df['new'] = df.apply(lambda x: res[0] if (res:=x[x.notna()].tolist()) else float('nan'), axis=1)

Output:

   col1  col2  col3  new
0   1.0   1.0   1.0  1.0
1   NaN   NaN   NaN  NaN
2   2.0   NaN   2.0  2.0
3   NaN   NaN   3.0  3.0
4   4.0   NaN   NaN  4.0
5   5.0   5.0   NaN  5.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading