Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Add column information based on length of another dataframe

I have two data frames. I want to add the columns from the second data frame to the first one by matching the column information in data frame 1.

The first data frame: dd

         BCS        libcs
0        AAA-1        1
1        ABA-1        1
2        ACD-1        1
3        AAT-1        1
4        AAR-1        1
           ...      ...
188453  TCC-61       61
188454  TTG-61       61
188455  CTG-61       61
188456  ATG-61       61
188457  TTT-61       61

print(dd.shape)
(188458, 2)

The second dataframe: df

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     sample_id       DState
0    H20_AGS          AG
1    H21_AGS          AG
2    H22_BGS          BG
3    H23_AGS          AG
4    H24_CGS          CG
..   ...     ...
56    H90_DGS          DG
57    H91_DGS          DG
58    H92_BGS          BG
59    H93_BGS          BG
60    H94_CGS          CG

print(df.shape)
(61, 2)

There are 61 unique items in column libcs in dataframe1 and 61 rows in the second data frame. I want to merge based on the matching column information.

Expected Output:

         BCS        libcs     sample_id       DState
0        AAA-1        1       H20_AGS          AG
1        ABA-1        1       H20_AGS          AG
2        ACD-1        1       H20_AGS          AG
3        AAT-1        1       H20_AGS          AG
4        AAR-1        1       H20_AGS          AG
           ...      ...
188453  TCC-61       61       H94_CGS          CG
188454  TTG-61       61       H94_CGS          CG
188455  CTG-61       61       H94_CGS          CG
188456  ATG-61       61       H94_CGS          CG
188457  TTT-61       61       H94_CGS          CG

I tried the below code but it seems like it is not working

dd.libcs.value_counts()
dd.index = dd.groupby(level=0).cumcount()
C = dd.join(df).reset_index(drop=True)
print (C)

>Solution :

Use merge

>>> dd.merge(df, left_on='libcs', right_on=(df.index+1).astype(str))

      BCS  libcs sample_id DState
0   AAA-1      1   H20_AGS     AG
1   ABA-1      1   H20_AGS     AG
2   ACD-1      1   H20_AGS     AG
3   AAT-1      1   H20_AGS     AG
4   AAR-1      1   H20_AGS     AG
5  TCC-61     61   H94_CGS     CG
6  TTG-61     61   H94_CGS     CG
7  CTG-61     61   H94_CGS     CG
8  ATG-61     61   H94_CGS     CG
9  TTT-61     61   H94_CGS     CG
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading