Smash rows from dataframe with NaN values
I have this dataframe:
| descritpion | quantity | value |
|---|---|---|
| This is the description | 2 | 100,00 |
| for the first row | ||
| Row number 2 | 10 | 150,00 |
| Row number 3 | 15 | 200,00 |
As we can see, the description for the first row takes 2 lines, but I have no way to know that description is that long.
I have tried to iterate over rows, and if those rows have NaN.sum() == len(df.shape[1]) -1 then I take the temp_description_col = dataframe.description.iloc[row_index_iterator] and join the descriptions to the last row with join(dataframe.description.iloc[row_index_iterator - 1].join(temp_description_col). I also have tried out = (df.bfill().groupby(['Importe'], as_index=False).agg({'Concepto': ' '.join})) but it creates the first row with the smashed rows.
The problem I found is, that ofcourse seems very rudimentary to check ever row and overwrite the description from the last row if I find nans in all columns except one and the possibility that the description with nans belongs to the rows after. For example:
| descritpion | quantity | value |
|---|---|---|
| This is the description | ||
| for the first row | 2 | 100,00 |
| Row 2 description | ||
| and Row2 continuation | 15 | 200,00 |
In this dataframe we can check that the first row descrption belongs to the second row, since the rest of the columns have NaN, and the third description belongs to the 4th row, which by smashing them, it is the second row.
Wanted output:
df1
| descritpion | quantity | value |
|---|---|---|
| This is the description for the first row | 2 | 100,00 |
| Row number 2 | 10 | 150,00 |
| Row number 3 | 15 | 200,00 |
df2
| descritpion | quantity | value |
|---|---|---|
| This is the description for the first row | 2 | 100,00 |
| Row 2 descriptionand Row2 continuation | 15 | 200,00 |
>Solution :
For the general logic, you can use a custom aggregation:
# group empty rows (except on "description") with previous ones
group = df.drop(columns='description').notna().any(axis=1).cumsum()
# concatenate "description" else get first value
agg = {c: 'first' for c in df}
agg['description'] = ' '.join
out = df.groupby(group).agg(agg)
Output:
description quantity value
1 This is the description for the first row 2.0 100,00
2 Row number 2 10.0 150,00
3 Row number 3 15.0 200,00
Now to handle the case in which you can have an empty first row, use:
m = df.drop(columns='description').notna().any(axis=1)
group = m.shift(-1 if (m.iloc[0]==False) else 0, fill_value=False).cumsum()
agg = {c: 'first' for c in df}
agg['description'] = ' '.join
out = df.groupby(group).agg(agg)
Output:
description quantity value
1 This is the description for the first row 2.0 100,00
2 Row 2 description and Row2 continuation 15.0 200,00