Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove string from one column if present in string of another column pandas

I feel like I’m close but I am looking for something like this where the new column writes the company name without the city in it:

                           company  postal_code  name state         city  \
2000-01-01          abc gresham co        97080  john    mi      gresham   
2000-01-01             startup llc        97080  jeff    hi     portland   
2001-01-01  beaverton business biz        99999   sam    ca    beaverton   
2002-01-01                 andy co        92222  joey    or  los angeles   

                 new_col  
2000-01-01        abc co  
2000-01-01   startup llc  
2001-01-01  business biz  
2002-01-01       andy co  

This is what I have so far but it’s throwing a TypeError: unhashable type: 'Series':

for idx in df1.index:
    if df1["city"].loc[idx] in df1['company'].loc[idx]:
        print("figure out how to print to new column the company name without the city included")
    else:
        print(df1['company'].loc[idx])

Thanks!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here’s a solution:

df = (
    df.reset_index()
    .assign(new_col=df.reset_index()
        .pipe(lambda x: x.assign(x=x['company'].str.split(' ')))
        .explode('x')
        .loc[lambda x: x['x'] != x['city'], 'x']
        .groupby(level=0)
        .agg(list)
        .str.join(' ')
    )
    .set_index('index')
)

Output:

>>> df
                           company  postal_code  name state         city       new_col
index                                                                                 
2000-01-01          abc gresham co        97080  john    mi      gresham        abc co
2000-01-01             startup llc        97080  jeff    hi     portland   startup llc
2001-01-01  beaverton business biz        99999   sam    ca    beaverton  business biz
2002-01-01                 andy co        92222  joey    or  los angeles       andy co

One-liner:

df = df.reset_index().assign(new_col=df.reset_index().pipe(lambda x: x.assign(x=x['company'].str.split(' '))).explode('x').loc[lambda x: x['x'] != x['city'], 'x'].groupby(level=0).agg(list).str.join(' ')).set_index('index')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading