here is the sample data for my question
| pid | date | task | language |
|---|---|---|---|
| 8522 | 17-09-2022 | listen | english |
| 7851 | 17-09-2022 | read | hindi |
| 7546 | 17-09-2022 | telugu | |
| 3584 | write | ||
| 8522 | 18-09-2022 | read | |
| 7851 | 19-09-2022 | hindi | |
| 8522 | speak | hindi | |
| 7546 | 19-09-2022 | speak | english |
| 3584 | write | tamil | |
| 8522 | 25-09-2022 |
i tried to code and got the output
data=df.groupby('pid')[['date','task','language']].transform(lambda x: '|'.join(x))
and i got this output
and i want the output to be with no empty/null values to be added so my desired final output would look like this but i have not idea how to filter or remove the values and get the output
any answer or suggestion is helpful
>Solution :
You can use pandas.Series.dropna in the lambda expression before joining.
Here is what works for me:
out = (
df
.groupby('pid')[['date','task','language']]
.agg(lambda x: '|'.join(x
.replace(r'^\s*$', np.nan, regex=True)
.dropna()
.astype(str)
)
)
)
print(out)
date task language
pid
3584 write|write tamil
7546 17-09-2022|19-09-2022 speak telugu|english
7851 17-09-2022|19-09-2022 read hindi|hindi
8522 17-09-2022|18-09-2022|25-09-2022 listen|read|speak english|hindi

