Home How to properly concatenate str_values from dataframe rows with NaN values in Pandas?

Questions

How to properly concatenate str_values from dataframe rows with NaN values in Pandas?

May 31, 2023

Smash rows from dataframe with NaN values

I have this dataframe:

descritpion	quantity	value
This is the description	2	100,00
for the first row
Row number 2	10	150,00
Row number 3	15	200,00

As we can see, the description for the first row takes 2 lines, but I have no way to know that description is that long.
I have tried to iterate over rows, and if those rows have NaN.sum() == len(df.shape[1]) -1 then I take the temp_description_col = dataframe.description.iloc[row_index_iterator] and join the descriptions to the last row with join(dataframe.description.iloc[row_index_iterator - 1].join(temp_description_col). I also have tried out = (df.bfill().groupby(['Importe'], as_index=False).agg({'Concepto': ' '.join})) but it creates the first row with the smashed rows.

The problem I found is, that ofcourse seems very rudimentary to check ever row and overwrite the description from the last row if I find nans in all columns except one and the possibility that the description with nans belongs to the rows after. For example:

descritpion	quantity	value
This is the description
for the first row	2	100,00
Row 2 description
and Row2 continuation	15	200,00

In this dataframe we can check that the first row descrption belongs to the second row, since the rest of the columns have NaN, and the third description belongs to the 4th row, which by smashing them, it is the second row.

Wanted output:

df1

descritpion	quantity	value
This is the description for the first row	2	100,00
Row number 2	10	150,00
Row number 3	15	200,00

df2

descritpion	quantity	value
This is the description for the first row	2	100,00
Row 2 descriptionand Row2 continuation	15	200,00

>Solution :

For the general logic, you can use a custom aggregation:

# group empty rows (except on "description") with previous ones
group = df.drop(columns='description').notna().any(axis=1).cumsum()

# concatenate "description" else get first value
agg = {c: 'first' for c in df}
agg['description'] = ' '.join

out = df.groupby(group).agg(agg)

Output:

                                 description  quantity   value
1  This is the description for the first row       2.0  100,00
2                               Row number 2      10.0  150,00
3                               Row number 3      15.0  200,00

Now to handle the case in which you can have an empty first row, use:

m = df.drop(columns='description').notna().any(axis=1)
group = m.shift(-1 if (m.iloc[0]==False) else 0, fill_value=False).cumsum()

agg = {c: 'first' for c in df}
agg['description'] = ' '.join

out = df.groupby(group).agg(agg)

Output:

                                 description  quantity   value
1  This is the description for the first row       2.0  100,00
2    Row 2 description and Row2 continuation      15.0  200,00