import pandas as pd
import numpy as np
text1 = ['22211', '1111', np.NaN]
Int1 = ['5555', np.NaN, '4444']
Float1 = [np.NaN, '3333.0', '231.0']
Text2 = ['222115555', '11113333', '4444231']
df = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1})
df_result = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1, 'Result': Text2})
Need to concatenate cols with NaN and different types.
Need help understanding how to do this properly. I run into it over and over again and keep having issues.
I have tried to set each column as a 'str' when using pd.read_csv or pd.read_excel
Also tried this from other questions I have seen on here.
df = df.fillna("") (so astype(‘int’) will work)
df['Float1'] = df['Float1'].astype('int').astype('str')
Get this usually — ValueError: invalid literal for int() with base 10: ''
The df['Results'] are usually keys in mapping dictionaries (more cols in df above) for other datasets, so getting them in str format is usually best–open to other opinions and spend hours on it.
Just looking for ‘How to do this, conceptually and concretely’, since I continually run into this when trying to analyze/manipulate data from our legacy excel/csv sheets.
>Solution :
IIUC, you could cast to dtype str, join, then use str.replace to get rid of 'nan' values. Then perhaps use str.rstrip to get rid of the trailing ".0"s:
df['Result'] = df.astype(str).apply(''.join, axis=1).str.replace('nan','').str.rstrip('.0')
Output:
Text1 Int1 Float1 Result
0 22211 5555 NaN 222115555
1 1111 NaN 3333.0 11113333
2 NaN 4444 231.0 4444231