Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Apply a string replace to several columns of a pandas dataframe

I have a dataframe with several columns, two of which are strings of URIs with a final fragment such as:

http://company.com/information#name

http://company.com/information#Company

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

where I need to keep only "name" and "Company" URI fragments, and remove the string before the pound.

I have written the following function to do so on a passed dataframe , also passing a list of column names to act upon, and finally the string to remove from each of them:

def uri_fragment(DF: pd.DataFrame, COLUMN_LIST: list, URI_STRING: str) -> pd.DataFrame:
    for DF_COLUMN in COLUMN_LIST:
        DF['DF_COLUMN'] = DF['DF_COLUMN'].map(lambda x: x.replace(URI_STRING,''))
    return DF

which I invoke as:

my_df = uri_fragment(my_df, ['class', 'type'], "http://company.com/information#") 

to get the "class" and "type" dataframe columns cleaned up of the passed URI string.

but get the following error:

KeyError: 'DF_COLUMN'

What am I overlooking/misunderstanding?
Thank you

>Solution :

You are using a literal string in your function. You should remove the quotes:

DF[DF_COLUMN] = DF[DF_COLUMN].…

That said, a simpler method would be to use a regex. map will be quite slow:

for col in ['col', 'col2']:
    # here extracting any terminal fragment. You could also use
    # f'{URI_STRING}([^#]+)$' for limited matching
    df[col] = df[col].str.extract('#([^#]+)$', expand=False)

Also, another critic of your code, you are both returning DF and modifying it in place. You should do only one of the two.

Either don’t return anything and modify in place, or return a new dataframe. For the second option, make a copy of DF by adding DF = DF.copy() in the beginning of the function.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading