Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to properly concatenate str_values from dataframe rows with NaN values in Pandas?

Smash rows from dataframe with NaN values

I have this dataframe:

descritpion quantity value
This is the description 2 100,00
for the first row
Row number 2 10 150,00
Row number 3 15 200,00

As we can see, the description for the first row takes 2 lines, but I have no way to know that description is that long.
I have tried to iterate over rows, and if those rows have NaN.sum() == len(df.shape[1]) -1 then I take the temp_description_col = dataframe.description.iloc[row_index_iterator] and join the descriptions to the last row with join(dataframe.description.iloc[row_index_iterator - 1].join(temp_description_col). I also have tried out = (df.bfill().groupby(['Importe'], as_index=False).agg({'Concepto': ' '.join})) but it creates the first row with the smashed rows.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The problem I found is, that ofcourse seems very rudimentary to check ever row and overwrite the description from the last row if I find nans in all columns except one and the possibility that the description with nans belongs to the rows after. For example:

descritpion quantity value
This is the description
for the first row 2 100,00
Row 2 description
and Row2 continuation 15 200,00

In this dataframe we can check that the first row descrption belongs to the second row, since the rest of the columns have NaN, and the third description belongs to the 4th row, which by smashing them, it is the second row.

Wanted output:

df1

descritpion quantity value
This is the description for the first row 2 100,00
Row number 2 10 150,00
Row number 3 15 200,00

df2

descritpion quantity value
This is the description for the first row 2 100,00
Row 2 descriptionand Row2 continuation 15 200,00

>Solution :

For the general logic, you can use a custom aggregation:

# group empty rows (except on "description") with previous ones
group = df.drop(columns='description').notna().any(axis=1).cumsum()

# concatenate "description" else get first value
agg = {c: 'first' for c in df}
agg['description'] = ' '.join

out = df.groupby(group).agg(agg)

Output:

                                 description  quantity   value
1  This is the description for the first row       2.0  100,00
2                               Row number 2      10.0  150,00
3                               Row number 3      15.0  200,00

Now to handle the case in which you can have an empty first row, use:

m = df.drop(columns='description').notna().any(axis=1)
group = m.shift(-1 if (m.iloc[0]==False) else 0, fill_value=False).cumsum()

agg = {c: 'first' for c in df}
agg['description'] = ' '.join

out = df.groupby(group).agg(agg)

Output:

                                 description  quantity   value
1  This is the description for the first row       2.0  100,00
2    Row 2 description and Row2 continuation      15.0  200,00
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading