Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Running computations on sums of a column's unique values in pandas data frame

I have a pandas data frame as follows:

       State   ProductName2 Units Scripts Total Amount Market Share
0         FL          Advil     1       5            1             
1         FL          Advil     2       7           54             
2         FL    Advil Total     3      12           55             
3         FL        Mucinex     3       6          321             
4         FL  Mucinex Total     3       6          321             
5         FL        Solosec     4       4           45             
6         FL  Solosec Total     4       4           45             
7   FL Total                   10      22          421             
8         GA          Advil     5       2           89             
9         GA          Advil     6       9           48             
10        GA    Advil Total    11      11          137             
11        GA        Mucinex     7      10           23             
12        GA        Mucinex     8       2           56             
13        GA  Mucinex Total    15      12           79             
14  GA Total                   26      23          216             
15        LA          Advil     9       6            9             
16        LA          Advil     4       7           26             
17        LA          Advil     2      12           32             
18        LA    Advil Total    15      25           67             
19  LA Total                   15      25           67             

There are subtotals being run for each state and also for each unique ‘ProductName2’ in each state. For the ‘Market Share’ column, I want it to be equal to scripts/total scripts for the state (every state’s market share total should be 100%).

This is what I tried (my data frame is labeled ‘result_df’):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

rows = result_df.values.tolist()


state_sums = []

states = []


for row in rows:
    
    if 'Total' in row[0]:
        
        state_sums.append(row[3])


for state in result_df['State'].unique():
    
    if 'Total' not in state:
        
        states.append(state)


for row in rows:
    
    if row[0] in states:
        
        index = states.index(row[0])
        
        row[5] = (float(row[3]) / float(state_sums[index]))


print(result_df)

This is what I want my data frame to look like:

       State   ProductName2 Units Scripts Total Amount Market Share
0         FL          Advil     1       5            1       .227      
1         FL          Advil     2       7           54       .318    
2         FL    Advil Total     3      12           55       .545   
3         FL        Mucinex     3       6          321       .273      
4         FL  Mucinex Total     3       6          321       .273      
5         FL        Solosec     4       4           45       .182      
6         FL  Solosec Total     4       4           45       .182      
7   FL Total                   10      22          421       #(this value should be 1.00 but kept 
                                                              #omitted)      
# And so on for the following states...   

I see how this can be done for only one state, but I’m not sure how to do this for multiple states. The part that is stumping me is how to tell the code to divide the scripts value by the subtotal of scripts for the correct state. Any help on this would be amazing, thanks in advance! 🙂

>Solution :

If the "Total" for each state is always the last row within that group, you can do this much more efficiently:

df["Market Share"] = df["Scripts"].div(df["Scripts"].where(df["State"].str.contains("Total")).bfill())

>>> df
       State   ProductName2  Units  Scripts  Total Amount  Market Share
0         FL          Advil      1        5             1      0.227273
1         FL          Advil      2        7            54      0.318182
2         FL    Advil Total      3       12            55      0.545455
3         FL        Mucinex      3        6           321      0.272727
4         FL  Mucinex Total      3        6           321      0.272727
5         FL        Solosec      4        4            45      0.181818
6         FL  Solosec Total      4        4            45      0.181818
7   FL Total                    10       22           421      1.000000
8         GA          Advil      5        2            89      0.086957
9         GA          Advil      6        9            48      0.391304
10        GA    Advil Total     11       11           137      0.478261
11        GA        Mucinex      7       10            23      0.434783
12        GA        Mucinex      8        2            56      0.086957
13        GA  Mucinex Total     15       12            79      0.521739
14  GA Total                    26       23           216      1.000000
15        LA          Advil      9        6             9      0.240000
16        LA          Advil      4        7            26      0.280000
17        LA          Advil      2       12            36      0.480000
18        LA    Advil Total     15       25            67      1.000000
19  LA Total                    15       25            67      1.000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading