Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove unwanted characters from Dataframe values in Pandas

I have the following Dataframe full of locus/gen names from a multiple genome alignment.

However, I am trying to get only a full list of the locus/name without the coordinates.

    Tuberculosis_locus  Smagmatis_locus             H37RA_locus             Bovis_locus
0   0:Rv0001:1-1524     1:MSMEG_RS33460:6986600-6988114 2:MRA_RS00005:1-1524    3:BQ2027_RS00005:1-1524
1   0:Rv0002:2052-3260  1:MSMEG_RS00005:499-1692    2:MRA_RS00010:2052-3260 3:BQ2027_RS00010:2052-3260
2   0:Rv0003:3280-4437  1:MSMEG_RS00015:2624-3778   2:MRA_RS00015:3280-4437 3:BQ2027_RS00015:3280-4437

To avoid issues with empty cells, I am filling cells with ‘N/A’ and then striping the unwanted characters. But it’s giving the same exact result, nothing seems to be happening.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for value in orthologs['Tuberculosis_locus']:
    orthologs['Tuberculosis_locus'] = orthologs['Tuberculosis_locus'].fillna("N/A")
    orthologs['Tuberculosis_locus'] = orthologs['Tuberculosis_locus'].map(lambda x: x.lstrip('\d:').rstrip(':\d+'))

Any idea on what I am doing wrong? I’d like the following output:

Tuberculosis_locus  Smagmatis_locus  H37RA_locus  Bovis_locus
    0   Rv0001  MSMEG_RS33460   MRA_RS00005 BQ2027_RS00005
    1   Rv0002  MSMEG_RS00005   MRA_RS00010 BQ2027_RS00010
    2   Rv0003  MSMEG_RS00015   MRA_RS00015 BQ2027_RS00015

>Solution :

Split by : with a maximum split of two and then take the 2nd elements, eg:

df.applymap(lambda v: v.split(':', 2)[1])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading