Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Possible combinations of pandas columns from a list

I have a pandas dataframe in python, with the following structure:

Date A B C D E F G H
2022-01-01 10:00:00 1 3 0 4 1 0 0 1
2022-01-01 11:00:00 0 1 1 2 0 2 3 5
2022-01-01 12:00:00 0 2 0 1 0 1 0 0
2022-01-01 13:00:00 0 0 2 0 0 3 1 2
2022-01-02 10:00:00 0 3 0 1 2 4 4 1
2022-01-02 11:00:00 0 0 6 0 0 0 5 0
2022-01-02 12:00:00 0 0 6 0 2 4 5 3
2022-01-02 13:00:00 0 0 0 1 3 0 0 0

This is a snippet of my real dataframe, which has lots of "letter" columns (more than 100). My problem is that I need to obtain the sum of the numbers for each datetime and letter for different combinations.

This was answered in All possible combinations as new columns of a dataframe based on primary column, but it computes all possible combinations from N to M elements. For a df of 100 columns, all possible combinations from 1 sigle letter column to the full combined 100 is madness and impossible to compute.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Fortunately, I am only interested in certain combinations (again, ca. 50 in the real df), which are given in a list:

list_possibilities = [‘A+B’, ‘A+B+D’, ‘B+D+E+G’, ‘F+H’, ‘D+E+F+G+H’, ‘D’, ‘F’, ‘G+H’]

As you can see, single columns such as "D" or "F" may also be a possibility. So the final df would be fairly compact, and wouldn’t need a brute-force computation of thousands of combinations (here shown only for the first two rows for simplicity):

Date A+B A+B+D B+D+E+G F+H D+E+F+G+H D F G+H
2022-01-01 10:00:00 4 8 8 1 6 4 0 1
2022-01-01 11:00:00 1 3 6 7 12 2 2 8

Knowing the prior structure of the combinations allows to exponentially decrease the combinations and sums pandas must do, but I am unable to generalize the previous code of the solution to this case. Any ideas?

>Solution :

Use concat in a dictionary comprehension:

out = (pd
   .concat({cols: df.set_index('Date')[cols.split('+')].sum(axis=1)
            for cols in list_possibilities}, axis=1)
   .reset_index()
)

Output:

                  Date  A+B  A+B+D  B+D+E+G  F+H  D+E+F+G+H  D  F  G+H
0  2022-01-01 10:00:00    4      8        8    1          6  4  0    1
1  2022-01-01 11:00:00    1      3        6    7         12  2  2    8
2  2022-01-01 12:00:00    2      3        3    1          2  1  1    0
3  2022-01-01 13:00:00    0      0        1    5          6  0  3    3
4  2022-01-02 10:00:00    3      4       10    5         12  1  4    5
5  2022-01-02 11:00:00    0      0        5    0          5  0  0    5
6  2022-01-02 12:00:00    0      0        7    7         14  0  4    8
7  2022-01-02 13:00:00    0      1        4    0          4  1  0    0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading