Apply a string replace to several columns of a pandas dataframe

byMR

February 14, 2022

I have a dataframe with several columns, two of which are strings of URIs with a final fragment such as:

http://company.com/information#name

http://company.com/information#Company

where I need to keep only "name" and "Company" URI fragments, and remove the string before the pound.

I have written the following function to do so on a passed dataframe , also passing a list of column names to act upon, and finally the string to remove from each of them:

def uri_fragment(DF: pd.DataFrame, COLUMN_LIST: list, URI_STRING: str) -> pd.DataFrame:
    for DF_COLUMN in COLUMN_LIST:
        DF['DF_COLUMN'] = DF['DF_COLUMN'].map(lambda x: x.replace(URI_STRING,''))
    return DF

which I invoke as:

my_df = uri_fragment(my_df, ['class', 'type'], "http://company.com/information#")

to get the "class" and "type" dataframe columns cleaned up of the passed URI string.

but get the following error:

KeyError: 'DF_COLUMN'

What am I overlooking/misunderstanding?
Thank you

>Solution :

You are using a literal string in your function. You should remove the quotes:

DF[DF_COLUMN] = DF[DF_COLUMN].…

That said, a simpler method would be to use a regex. map will be quite slow:

for col in ['col', 'col2']:
    # here extracting any terminal fragment. You could also use
    # f'{URI_STRING}([^#]+)$' for limited matching
    df[col] = df[col].str.extract('#([^#]+)$', expand=False)

Also, another critic of your code, you are both returning DF and modifying it in place. You should do only one of the two.

Either don’t return anything and modify in place, or return a new dataframe. For the second option, make a copy of DF by adding DF = DF.copy() in the beginning of the function.