Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to store the basenames of multiple paths in a column in pandas

I have a pandas dataframe as such :


Term.            DocFreq.  TermFreq.  Ngram.  Filenames

witness says     1        1          2       '/Users/KieraKatsalapov/Desktop//LuceneIndexing/Docs/cnnValBartCnnDocs/doc657.txt'
witness says of  2        2          3       '/Users/KieraKatsalapov/Desktop/LuceneIndexing/Docs/cnnValBartCnnDocs/doc192.txt,/Users/KieraKatsalapov/Desktop/LuceneIndexing/Docs/cnnValBartCnnDocs/doc153.txt'
.
.
.

I need to convert the filenames to the basenames. I know I can do this using

df['Filenames'] = df['Filenames'].apply(os.path.basenames)

But this converts only the last filename. For example, it will convert the filenamne in the 2nd entry directly to "doc153.txt".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Whereas, I need it to be – "doc192.txt, doc153.txt"

I am assuming I need to use the lambda function that will take in the whole filename value and return the output containing multiple filenames. But I don’t know how to proceed.

Please help.

>Solution :

You can split values by , and for each value call os.path.basename, last join back by ,:

df['Filenames'] = df['Filenames'].apply(lambda x:','.join(os.path.basename(y) for y in x.split(',')))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading