Follow

Follow

Contact

Home Add an empty row in a dataframe when the entries of a column repeats

Questions

Add an empty row in a dataframe when the entries of a column repeats

byMR

May 9, 2022

I have a dataframe that stores time-series data

Please find the code below

import pandas as pd
from pprint import pprint

d = {
    't': [0, 1, 2, 0, 2, 0, 1],
    'input': [2, 2, 2, 2, 2, 2, 4],
    'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
    'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}
df = pd.DataFrame(d)
pprint(df)

df>
t  input type  value
0      2    A    0.1
1      2    A    0.2
2      2    A    0.3
0      2    B    1.0
2      2    B    2.0
0      2    B    3.0
1      4    A    1.0

When the first entry of the column t repeats, I would like to add an empty row.

Expected output:

df>
t  input type  value
0      2    A    0.1
1      2    A    0.2
2      2    A    0.3

0      2    B    1.0
2      2    B    2.0

0      2    B    3.0
1      4    A    1.0

I am not sure how to do this. Suggestions will be really helpful.

EDIT:
dup = df['t'].eq(0).shift(-1, fill_value=False)

helps when starting value in row t si 0.

But it could also be a non-zero value like the example below.
Additional example:

d = {
    't': [25, 35, 90, 25, 90, 25, 35],
    'input': [2, 2, 2, 2, 2, 2, 4],
    'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
    'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}

>Solution :

There are several ways to achieve this

option 1

you can use groupby.apply:

(df.groupby(df['t'].eq(0).cumsum(), as_index=False, group_keys=False)
   .apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)

output:

     t  input type  value
0  0.0    2.0    A    0.1
1  1.0    2.0    A    0.2
2  2.0    2.0    A    0.3
   NaN    NaN  NaN    NaN
3  0.0    2.0    B    1.0
4  2.0    2.0    B    2.0
   NaN    NaN  NaN    NaN
5  0.0    2.0    B    3.0
6  1.0    4.0    A    1.0
   NaN    NaN  NaN    NaN

option 2

An alternative if the index is already sorted:

dup = df['t'].eq(0).shift(-1, fill_value=False)

pd.concat([df, df.loc[dup].assign(**{c: '' for c in df})]).sort_index()

output:

   t input type value
0  0     2    A   0.1
1  1     2    A   0.2
2  2     2    A   0.3
2                    
3  0     2    B   1.0
4  2     2    B   2.0
4                    
5  0     2    B   3.0
6  1     4    A   1.0

addendum on grouping

set the group when the value decreases:

dup = df['t'].diff().lt(0).cumsum()

(df.groupby(dup, as_index=False, group_keys=False)
   .apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)

dataframe

byMR

Published May 09, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Swift: task {} before iOS 15?

byMR

May 9, 2022

Questions

Python Generate new columns based on string condition

byMR

May 9, 2022

Questions

Using JavaScript reflection to invoke a function

byMR

May 9, 2022

Questions

Js axios error: import call expects exactly one argument

byMR

May 9, 2022

Questions

How to check the increase/decrease of letter input in pasting clipboard in jQuery

byMR

May 9, 2022

Questions

How can i add 6 rows of paragraphs with bootstrap?

byMR

May 9, 2022