Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Array_Split with grouped string indices

I have a dataframe that I would like to create sub-arrays within (i.e. chunk) based on groups of string values within the index. I’ve read how you can pass a list of string values as the indices variable in np.array_split, but my scenario is a bit more complicated and I’m unsure on best approach.

From the below table/array, I’d like to have 2 sub-arrays: one array which includes index string values "Alpha" and "Bravo", the second with values "Charlie" and "Delta"

Example table:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Index Column1 Column2
Alpha sample 12
Alpha sample 13
Alpha sample 14
Bravo sample 15
Charlie sample 16
Charlie sample 17
Delta sample 18
Delta sample 19
Delta sample 20
Delta sample 21

>Solution :

Assuming a DataFrame and that you want to split by custom groups:

groups = [['Alpha', 'Bravo'], ['Charlie', 'Delta']]

dfs = [g for _, g in df.groupby(df['Index'].map({k: v for v,l in enumerate(groups) for k in l}))]

Output:

dfs[0]

   Index Column1  Column2
0  Alpha  sample       12
1  Alpha  sample       13
2  Alpha  sample       14
3  Bravo  sample       15


dfs[1]

     Index Column1  Column2
4  Charlie  sample       16
5  Charlie  sample       17
6    Delta  sample       18
7    Delta  sample       19
8    Delta  sample       20
9    Delta  sample       21

Or, if "Index" is actually the index:

groups = [['Alpha', 'Bravo'], ['Charlie', 'Delta']]

dfs = [df.loc[l] for l in groups]

Output:

dfs[0]

      Column1  Column2
Alpha  sample       12
Alpha  sample       13
Alpha  sample       14
Bravo  sample       15

dfs[1]

        Column1  Column2
Charlie  sample       16
Charlie  sample       17
Delta    sample       18
Delta    sample       19
Delta    sample       20
Delta    sample       21

Finally, if you don’t have explicit combinations in mind but just want groups of 2 values (in order), then use:

dfs = [g for _,g in df.groupby(pd.factorize(df['Index'])[0]//2)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading