Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add an empty row in a dataframe when the entries of a column repeats

I have a dataframe that stores time-series data

Please find the code below

import pandas as pd
from pprint import pprint

d = {
    't': [0, 1, 2, 0, 2, 0, 1],
    'input': [2, 2, 2, 2, 2, 2, 4],
    'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
    'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}
df = pd.DataFrame(d)
pprint(df)

df>
t  input type  value
0      2    A    0.1
1      2    A    0.2
2      2    A    0.3
0      2    B    1.0
2      2    B    2.0
0      2    B    3.0
1      4    A    1.0

When the first entry of the column t repeats, I would like to add an empty row.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expected output:

df>
t  input type  value
0      2    A    0.1
1      2    A    0.2
2      2    A    0.3

0      2    B    1.0
2      2    B    2.0

0      2    B    3.0
1      4    A    1.0

I am not sure how to do this. Suggestions will be really helpful.

EDIT:
dup = df['t'].eq(0).shift(-1, fill_value=False)

helps when starting value in row t si 0.

But it could also be a non-zero value like the example below.
Additional example:

d = {
    't': [25, 35, 90, 25, 90, 25, 35],
    'input': [2, 2, 2, 2, 2, 2, 4],
    'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
    'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}

>Solution :

There are several ways to achieve this

option 1

you can use groupby.apply:

(df.groupby(df['t'].eq(0).cumsum(), as_index=False, group_keys=False)
   .apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)

output:

     t  input type  value
0  0.0    2.0    A    0.1
1  1.0    2.0    A    0.2
2  2.0    2.0    A    0.3
   NaN    NaN  NaN    NaN
3  0.0    2.0    B    1.0
4  2.0    2.0    B    2.0
   NaN    NaN  NaN    NaN
5  0.0    2.0    B    3.0
6  1.0    4.0    A    1.0
   NaN    NaN  NaN    NaN

option 2

An alternative if the index is already sorted:

dup = df['t'].eq(0).shift(-1, fill_value=False)

pd.concat([df, df.loc[dup].assign(**{c: '' for c in df})]).sort_index()

output:

   t input type value
0  0     2    A   0.1
1  1     2    A   0.2
2  2     2    A   0.3
2                    
3  0     2    B   1.0
4  2     2    B   2.0
4                    
5  0     2    B   3.0
6  1     4    A   1.0

addendum on grouping

set the group when the value decreases:

dup = df['t'].diff().lt(0).cumsum()

(df.groupby(dup, as_index=False, group_keys=False)
   .apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading