I have code that imports a file and concatenates the data horizontally. My input file looks like this:
| X | Y |
|---|---|
| a | hello |
| a | 3 |
| a | bye |
| a | hi |
| b | apple |
| b | orange |
| b | 4 |
and this is the output I need:
| X | Y |
|---|---|
| a | hello,3,bye,hi |
| b | apple,orange,4 |
I use this python code on Jupyter:
import pandas as pd
# df=pd.read_excel('test.xlsx')
df = pd.DataFrame({"X": ["a", "a", "a", "a", "b", "b", "b"],
"Y": ["hello", 3, "bye", "hi", "apple", "orange", 4]})
orden=df.groupby('X').Y.apply(','.join)
error: TypeError: sequence item 0: expected str instance, int found
I have validated other data, and I suspect that it falls by the integers. How could I improve my code so that it also concatenates numbers ans string?
>Solution :
Convert the Y column to a string first:
df = pd.DataFrame({"X": ["a", "a", "a", "a", "b", "b", "b"],
"Y": ["hello", 3, "bye", "hi", "apple", "orange", 4]})
df["Y"] = df["Y"].astype(str)
orden=df.groupby('X').Y.apply(','.join)
which gives orden=
X
a hello,3,bye,hi
b apple,orange,4
Name: Y, dtype: object