Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Manipulating dataframe rows Python

I have a dataframe (adjusted for simplicity) as follows:

                Location Code  Technology  ... Frequency
0                    ABLSERVP      Type A  ...       850
2                    ABLSERVP      Type A  ...       700
4                    ABLSERVP      Type B  ...       850
...                       ...         ...  ...       ...
1300                    CSEY3      Type A  ...      2100
1301                    CSEY3      Type A  ...       700
...                       ...         ...  ...       ...
265064                  CSEY1      Type A  ...       750
265065                  CSEY3      Type B  ...       850

What I’m trying to achieve:

                Location Code  Technologies  ...  Type A's  Type B's  ...  
0                    ABLSERVP      Type A,B  ...   700,850       850  ...
...                       ...         ...    ...       ...       ...  ...
265064                  CSEY1        Type A  ...       750       n/a  ...
265065                  CSEY3      Type A,B  ...  700,2100       850  ...

Is there anyway to do this without having to loop through the entire dataframe (I’ve read that this is inefficient and is one of the LAST resort).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My attempt:
I first sorted based on location code as follows:

x=x.sort_values(by='Location Code')

I thought I could get the required result by doing: df = x.groupby(['Location Code', 'Technology']).sum()

This obviously doesn’t work as it sums the frequencies instead of listing them. Any help?

>Solution :

Try with groupby, pivot and join:

tech = x.groupby("Location Code")["Technology"].agg(lambda x: ", ".join(x.unique().tolist()))
pivoted = (x.pivot_table(index="Location Code", 
                         columns="Technology", 
                         values="Frequency", 
                         aggfunc=lambda x: ", ".join(x.astype(str)))
           )
output = tech.to_frame().join(pivoted)

>>> output

                   Technology     Type A Type B
Location Code                                  
ABLSERVP       Type A, Type B   850, 700    850
CSEY1                  Type A        750    NaN
CSEY3          Type A, Type B  2100, 700    850
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading