Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Divide data into train, validation, and test based on group ID and index

I have this dataset. I want to split the data into training, validation, and testing as (60 ,20 ,20 ) considering the group ID and the index.

Example: Group Id = 1 will have the first 60 % of the data in the training(indexes 0,1,2 ), and the second 20% in the validation (index 3) and the rest in testing (index 4) and so on for all group ids

pd.DataFrame({'Group_ID':[1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3],
               'Target': [1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,1,0,1,0,0,0,1,1]})


MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Try with sample , then drop the index

train = df.groupby('Group_ID').sample(frac=0.6)
test = df.drop(train.index).groupby('Group_ID').sample(frac=0.5)#20% vs 20% 
vaild = df.drop(train.index).drop(test.index)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading