I have a pandas data frame as follows:
State ProductName2 Units Scripts Total Amount Market Share
0 FL Advil 1 5 1
1 FL Advil 2 7 54
2 FL Advil Total 3 12 55
3 FL Mucinex 3 6 321
4 FL Mucinex Total 3 6 321
5 FL Solosec 4 4 45
6 FL Solosec Total 4 4 45
7 FL Total 10 22 421
8 GA Advil 5 2 89
9 GA Advil 6 9 48
10 GA Advil Total 11 11 137
11 GA Mucinex 7 10 23
12 GA Mucinex 8 2 56
13 GA Mucinex Total 15 12 79
14 GA Total 26 23 216
15 LA Advil 9 6 9
16 LA Advil 4 7 26
17 LA Advil 2 12 32
18 LA Advil Total 15 25 67
19 LA Total 15 25 67
There are subtotals being run for each state and also for each unique ‘ProductName2’ in each state. For the ‘Market Share’ column, I want it to be equal to scripts/total scripts for the state (every state’s market share total should be 100%).
This is what I tried (my data frame is labeled ‘result_df’):
rows = result_df.values.tolist()


state_sums = []

states = []


for row in rows:

if 'Total' in row[0]:

state_sums.append(row[3])


for state in result_df['State'].unique():

if 'Total' not in state:

states.append(state)


for row in rows:

if row[0] in states:

index = states.index(row[0])

row[5] = (float(row[3]) / float(state_sums[index]))


print(result_df)
This is what I want my data frame to look like:
State ProductName2 Units Scripts Total Amount Market Share
0 FL Advil 1 5 1 .227
1 FL Advil 2 7 54 .318
2 FL Advil Total 3 12 55 .545
3 FL Mucinex 3 6 321 .273
4 FL Mucinex Total 3 6 321 .273
5 FL Solosec 4 4 45 .182
6 FL Solosec Total 4 4 45 .182
7 FL Total 10 22 421 #(this value should be 1.00 but kept
#omitted)
# And so on for the following states...
I see how this can be done for only one state, but I’m not sure how to do this for multiple states. The part that is stumping me is how to tell the code to divide the scripts value by the subtotal of scripts for the correct state. Any help on this would be amazing, thanks in advance! 🙂
>Solution :
If the "Total" for each state is always the last row within that group, you can do this much more efficiently:
df["Market Share"] = df["Scripts"].div(df["Scripts"].where(df["State"].str.contains("Total")).bfill())
>>> df
State ProductName2 Units Scripts Total Amount Market Share
0 FL Advil 1 5 1 0.227273
1 FL Advil 2 7 54 0.318182
2 FL Advil Total 3 12 55 0.545455
3 FL Mucinex 3 6 321 0.272727
4 FL Mucinex Total 3 6 321 0.272727
5 FL Solosec 4 4 45 0.181818
6 FL Solosec Total 4 4 45 0.181818
7 FL Total 10 22 421 1.000000
8 GA Advil 5 2 89 0.086957
9 GA Advil 6 9 48 0.391304
10 GA Advil Total 11 11 137 0.478261
11 GA Mucinex 7 10 23 0.434783
12 GA Mucinex 8 2 56 0.086957
13 GA Mucinex Total 15 12 79 0.521739
14 GA Total 26 23 216 1.000000
15 LA Advil 9 6 9 0.240000
16 LA Advil 4 7 26 0.280000
17 LA Advil 2 12 36 0.480000
18 LA Advil Total 15 25 67 1.000000
19 LA Total 15 25 67 1.000000