Say I have a df like this:
col1 col2
0. [1, 2, 3] [4, 5, 6]
1 [4, 6] [43]
2. [7, 11] NaN
3. None [9, 9]
4. [] [13, 17]
I want to combine the two columns into one combined list. For NaN and None values I would like to ignore them.
Expected results:
col1 col2 col3
0. [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] 43 [4, 6, 43]
2. [7, 11] NaN [7, 11]
3. None [9, 9] [9, 9]
4. [] [13, 17] [13, 17]
5. NaN NaN NaN
I’ve looked at numpy and pandas functions and am unable to do this. Help.
>Solution :
Solution with remove missing values if not exist in both columns with DataFrame.stack and aggregation:
df['col3'] = df[['col1','col2']].stack().groupby(level=0).sum()
Or:
df = pd.DataFrame({'col1':[[1, 2, 3], [4, 6], [7, 11], None, [], np.nan],
'col2':[[4, 5, 6], [43], np.nan, [9, 9],[13, 17], np.nan]})
df['col3'] = (df.stack()
.groupby(level=0)
.agg(lambda x: [z for y in x for z in y]))
print (df)
col1 col2 col3
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] [43] [4, 6, 43]
2 [7, 11] NaN [7, 11]
3 None [9, 9] [9, 9]
4 [] [13, 17] [13, 17]
5 NaN NaN NaN