I have a data that looks like this:
import pandas as pd
import numpy as np
mydict = {
'col1' : ['a', 'b', 'c'],
'col2' : ['d', np.NaN, 'e'],
'col3' : ['f', 'g', 'h']
}
mydf = pd.DataFrame(mydict)
I want to concatenate these string columns. I try this but it doesn’t work:
mydf['concat'] = mydf[['col1', 'col2', 'col3'].apply('-'.join, axis=1)
The error is TypeError: sequence item 0: expected str instance, float found.
How can I make it work? It should skip the missing value and only concatenate the non-missing values. The outcome should look like this:
concat_dict = {
'col1' : ['a', 'b', 'c'],
'col2' : ['d', np.NaN, 'e'],
'col3' : ['f', 'g', 'h'],
'concat' : ['a-d-f', 'b-g', 'c-e-h']
}
concat_df = pd.DataFrame(concat_dict)
>Solution :
Do your filter in the lambda function then do your joining.
>>> mydf['concat'] = mydf[['col1', 'col2', 'col3']].apply(
... lambda s: '-'.join(s[s.notnull()]), axis=1)
col1 col2 col3 concat
0 a d f a-d-f
1 b NaN g b-g
2 c e h c-e-h