I have a dataframe (adjusted for simplicity) as follows:
Location Code Technology ... Frequency
0 ABLSERVP Type A ... 850
2 ABLSERVP Type A ... 700
4 ABLSERVP Type B ... 850
... ... ... ... ...
1300 CSEY3 Type A ... 2100
1301 CSEY3 Type A ... 700
... ... ... ... ...
265064 CSEY1 Type A ... 750
265065 CSEY3 Type B ... 850
What I’m trying to achieve:
Location Code Technologies ... Type A's Type B's ...
0 ABLSERVP Type A,B ... 700,850 850 ...
... ... ... ... ... ... ...
265064 CSEY1 Type A ... 750 n/a ...
265065 CSEY3 Type A,B ... 700,2100 850 ...
Is there anyway to do this without having to loop through the entire dataframe (I’ve read that this is inefficient and is one of the LAST resort).
My attempt:
I first sorted based on location code as follows:
x=x.sort_values(by='Location Code')
I thought I could get the required result by doing: df = x.groupby(['Location Code', 'Technology']).sum()
This obviously doesn’t work as it sums the frequencies instead of listing them. Any help?
>Solution :
Try with groupby, pivot and join:
tech = x.groupby("Location Code")["Technology"].agg(lambda x: ", ".join(x.unique().tolist()))
pivoted = (x.pivot_table(index="Location Code",
columns="Technology",
values="Frequency",
aggfunc=lambda x: ", ".join(x.astype(str)))
)
output = tech.to_frame().join(pivoted)
>>> output
Technology Type A Type B
Location Code
ABLSERVP Type A, Type B 850, 700 850
CSEY1 Type A 750 NaN
CSEY3 Type A, Type B 2100, 700 850