Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum n consecutive entries in a sequence

apologies if the terminology isn’t right, I’m not quite sure how to ask the question accurately.

I would like to summarise my data, which for the purposes of a MWE consists of the time (in seconds) since starting the measurement (called ‘Start scan’ here), and then the total value (called ‘Sum.’ here). I have filtered the data down to those measurements above a certain cutoff. As the measurements rise and fall with a delay, each episode above the cutoff results in a few rows of data. I would like to reduce this down, so that each separate sequence is represented only by the first row in the sequence, and, if at all possible, also to create a new variable for this row containing the original number of rows

Here’s a short example.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

|       |   Start scan |        Sum. |
|------:|-------------:|------------:|
| 38191 |          247 | 7.06487e+06 |
| 38192 |          248 | 8.64094e+06 |
| 38193 |          249 | 8.82956e+06 |
| 38194 |          250 | 1.20744e+07 |
| 38195 |          251 | 1.28402e+07 |
| 38216 |          272 | 1.85569e+07 |
| 38217 |          273 | 2.83452e+07 |
| 38218 |          274 | 2.6532e+07  |
| 38219 |          275 | 2.2365e+07  |
| 38220 |          276 | 9.0689e+06  |
| 38230 |          286 | 6.92365e+06 |
| 38231 |          287 | 1.0727e+07  |
| 38232 |          288 | 7.81559e+06 |
| 38247 |          303 | 8.14354e+06 |
| 38381 |          437 | 8.4836e+06  |
| 38382 |          438 | 1.56635e+07 |
| 38383 |          439 | 1.24743e+07 |
| 38384 |          440 | 1.84585e+07 |
| 38385 |          441 | 1.82205e+07 |
| 38386 |          442 | 1.79627e+07 |
| 38387 |          443 | 8.56578e+06 |
| 38388 |          444 | 2.28438e+07 |
| 38389 |          445 | 8.19495e+06 |
| 38391 |          447 | 1.39808e+07 |
| 38392 |          448 | 8.30781e+06 |

I would like this converted to something like:

|       |   Start scan |        Sum. | scans summarised |
|------:|-------------:|------------:|-----------------:|
| 38191 |          247 | 7.06487e+06 |                 5|
| 38216 |          272 | 1.85569e+07 |                 5|
| 38230 |          286 | 6.92365e+06 |                 3|
| 38247 |          303 | 8.14354e+06 |                 1|
| 38381 |          437 | 8.4836e+06  |                 9|
| 38391 |          447 | 1.39808e+07 |                 2|

I’ve tried my best but can’t find a way to do this.

Very grateful for any help or pointers.

>Solution :

You can aggregate by consecutive values created by compare difference by Series.diff with 1 and cumulative sum by Series.cumsum, then aggregate columns in GroupBy.agg by GroupBy.first, GroupBy.sum and DataFrameGroupBy.size:

d = {'Start scan':('Start scan','first'), 
     'Sum.':('Sum.','sum'), 
     'scans summarised':('Sum.','size')}
out = df.groupby(df['Start scan'].diff().ne(1).cumsum()).agg(**d).reset_index(drop=True)
print (out)
   Start scan         Sum.  scans summarised
0         247   49449970.0                 5
1         272  104868000.0                 5
2         286   25466240.0                 3
3         303    8143540.0                 1
4         437  130867630.0                 9
5         447   22288610.0                 2

If original indices are important:

d = {'i':('index','first'), 
     'Start scan':('Start scan','first'),
     'Sum.':('Sum.','sum'), 
     'scans summarised':('Sum.','size')}

df1 = df.reset_index()
out = (df1.groupby(df1['Start scan'].diff().ne(1).cumsum())
          .agg(**d).set_index('i').rename_axis(None))
print (out)
       Start scan         Sum.  scans summarised
38191         247   49449970.0                 5
38216         272  104868000.0                 5
38230         286   25466240.0                 3
38247         303    8143540.0                 1
38381         437  130867630.0                 9
38391         447   22288610.0                 2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading