I have a dataframe with several columns, two of which are strings of URIs with a final fragment such as:
http://company.com/information#name
http://company.com/information#Company
where I need to keep only "name" and "Company" URI fragments, and remove the string before the pound.
I have written the following function to do so on a passed dataframe , also passing a list of column names to act upon, and finally the string to remove from each of them:
def uri_fragment(DF: pd.DataFrame, COLUMN_LIST: list, URI_STRING: str) -> pd.DataFrame:
for DF_COLUMN in COLUMN_LIST:
DF['DF_COLUMN'] = DF['DF_COLUMN'].map(lambda x: x.replace(URI_STRING,''))
return DF
which I invoke as:
my_df = uri_fragment(my_df, ['class', 'type'], "http://company.com/information#")
to get the "class" and "type" dataframe columns cleaned up of the passed URI string.
but get the following error:
KeyError: 'DF_COLUMN'
What am I overlooking/misunderstanding?
Thank you
>Solution :
You are using a literal string in your function. You should remove the quotes:
DF[DF_COLUMN] = DF[DF_COLUMN].…
That said, a simpler method would be to use a regex. map will be quite slow:
for col in ['col', 'col2']:
# here extracting any terminal fragment. You could also use
# f'{URI_STRING}([^#]+)$' for limited matching
df[col] = df[col].str.extract('#([^#]+)$', expand=False)
Also, another critic of your code, you are both returning DF and modifying it in place. You should do only one of the two.
Either don’t return anything and modify in place, or return a new dataframe. For the second option, make a copy of DF by adding DF = DF.copy() in the beginning of the function.