Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to split a DataFrame into separate groups based on a Value(Difference Value of Time)

I’m now using Pandas in Python to handle some data.

Simplified DataFrame is

[ID, TimeDiff] and some other not-important columns.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For example:

73    1.166667
74    1.166667
75    2.183333
76    3.466667
77    2.666667
78    Na

TimeDiff means the difference value of time in data ID. 3.466667 means the time between ID 76 and ID 77 is 3.466667 hours.
I want to split vessel data to make sure the difference value of time is within 2 hours
so I need to split DataFrame into N different groups (in this example, N = 4).

I need to get results like this: whenever TimeDiff >= 2 create another group

--------------
73    1.166667
74    1.166667
-------------
75    2.183333
--------------
76    3.466667
--------------
77         NaN

I have tried to use Groupby in pandas.

df.groupby('TimeDiff')   

But obviously, this is not what I want.
I’m now trying to split DataFrame step by step like this:
From

73    1.166667
74    1.166667
--------------
75    2.183333
76    3.466667
77    2.666667
78    Na

To

73    1.166667
74    1.166667
--------------
75    2.183333
--------------
76    3.466667
77    2.666667
78    Na

Then To

73    1.166667
74    1.166667
--------------
75    2.183333
--------------
76    3.466667
77    2.666667
78    Na

……

Finally to what I want:

--------------
73    1.166667
74    1.166667
-------------
75    2.183333
--------------
76    3.466667
--------------
77         NaN

4 groups Data. But after searching Google and StackOverflow I didn’t find a proper way to handle it. Can somebody help me?

>Solution :

You can use cumsum to create your groups:

df['Group'] = df['TimeDiff'].fillna(np.inf).ge(2).cumsum()
print(df)

# Output
   ID  TimeDiff  Group
0  73  1.166667      0
1  74  1.166667      0
2  75  2.183333      1
3  76  3.466667      2
4  77  2.666667      3
5  78       NaN      4

Using groupby:

>>> list(df.groupby(df['TimeDiff'].fillna(np.inf).ge(2).cumsum()))
[(0,
     ID  TimeDiff  Group
  0  73  1.166667      0
  1  74  1.166667      0),

 (1,
     ID  TimeDiff  Group
  2  75  2.183333      1),

 (2,
     ID  TimeDiff  Group
  3  76  3.466667      2),

 (3,
     ID  TimeDiff  Group
  4  77  2.666667      3),

 (4,
     ID  TimeDiff  Group
  5  78       NaN      4)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading