Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filling DataFrame with missing ranges present in two columns

I have a dataframe which contains starting and ending Timestamp acting as range for audio clipping, which can be generated like this:

import pandas as pd

df = pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'),2: pd.Timestamp('1900-01-01 00:00:18.743000'), 3: pd.Timestamp('1900-01-01 00:00:21.383000'), 4: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:16.342000'), 1: pd.Timestamp('1900-01-01 00:00:18.543000'), 2: pd.Timestamp('1900-01-01 00:00:20.712000'), 3: pd.Timestamp('1900-01-01 00:00:22.482000'), 4: pd.Timestamp('1900-01-01 00:00:24.653000')}})
                      start                     end
0   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
1   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
2   1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
3   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
4   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

And I want to generate a dataframe filling starting and ending timestamp where these timestamp don’t come, meaning ranges where these entries is not present.
So something like this:

pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:00.000000'), 1: pd.Timestamp('1900-01-01 00:00:14.373000'), 2: pd.Timestamp('1900-01-01 00:00:16.342000'), 3: pd.Timestamp('1900-01-01 00:00:18.543000'), 4: pd.Timestamp('1900-01-01 00:00:20.712000'), 5: pd.Timestamp('1900-01-01 00:00:21.383000'), 6: pd.Timestamp('1900-01-01 00:00:22.482000'), 7: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'), 2: pd.Timestamp('1900-01-01 00:00:18.543000'), 3: pd.Timestamp('1900-01-01 00:00:20.712000'), 4: pd.Timestamp('1900-01-01 00:00:21.383000'), 5: pd.Timestamp('1900-01-01 00:00:22.482000'), 6: pd.Timestamp('1900-01-01 00:00:22.812000'), 7: pd.Timestamp('1900-01-01 00:00:24.653000')}})
                      start                     end
0   1900-01-01 00:00:00.000 1900-01-01 00:00:14.373
1   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3   1900-01-01 00:00:18.543 1900-01-01 00:00:20.712
4   1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
5   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
6   1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
7   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

I couldn’t come with any feasible solutions other than iterating individual rows, what could be the best way to do this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

IIUC, you could get all unique timestamps and generate a new dataframe from the shifted values:

vals = df[['start', 'end']].stack().unique()
vals2 = np.concatenate([np.array([0], dtype=vals.dtype), vals])
df2 = pd.DataFrame(zip(vals2, vals), columns=['start', 'end'])

output:

                    start                     end
0 1970-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:18.743
4 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
5 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
6 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
7 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
8 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading