Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

efficient way to find unique values within time windows in python?

I have a large pandas dataframe that countains data similar to the image attached.

enter image description here

I want to get a count of how many unique TN exist within each 2 second window of the data. I’ve done this with a simple loop, but it is incredibly slow. Is there a better technique I can use to get this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My original code is:

uniqueTN = []
tmstart = 5400; tmstop = 86400
for tm in range(int(tmstart), int(tmstop), 2):
    df = rundf[(rundf['time']>=(tm-2))&rundf['time']<tm)]
    uniqueTN.append(df['TN'].unique())

This solution would be fine it the set of data was not so large.

>Solution :

Here is how you can implement groupby() method and nunique().

rundf['time'] = (rundf['time'] // 2) * 2
grouped = rundf.groupby('time')['TN'].nunique()

Another alternative is to use the resample() method of pandas and then the nunique() method.

grouped = rundf.resample('2S', on='time')['TN'].nunique()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading