Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to down sample a dataframe in Python based on condition

I am new here so don’t know how to use this site.

I have a timeseries data of 37404 ICU Patients. Each patient have multiple rows. I want to down sample my dataframe and select only 2932 patients (all rows of the respective patient ID). Can anyone help me? My data looks like this:

HR SBP DBP Sepsis P_ID
92 120 80 0 0
98 115 85 0 0
93 125 75 0 1
95 130 90 0 1
102 120 80 0 1
109 115 75 0 2
94 135 100 0 2
97 100 70 0 3
85 120 80 0 4
88 115 75 0 4
93 125 85 0 4
78 130 90 0 5
115 140 110 0 5
102 120 80 0 5
98 140 110 0 5

I know I should use some condition on P_ID column, but I am confused.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks for the help.

>Solution :

Use numpy.random.choice for random P_ID and filter in Series.isin with boolean indexing:

df2 = df[df['P_ID'].isin(np.random.choice(df['P_ID'].unique(), size=2932, replace=False))]

Alternative:

df2 = df[df['P_ID'].isin(df['P_ID'].drop_duplicates().sample(n=2932))]

EDIT: For random positions use:

df1 = df['P_ID'].drop_duplicates().sample(n=2932).to_frame('P_ID')

df2 = df.merge(df1, how='right')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading