Home Pandas – Remove part of string in column that is already in another column

Questions

Pandas – Remove part of string in column that is already in another column

May 18, 2022

I have this dataframe :

dfA = pd.DataFrame({
            'A': ['abc','ghi','mno', 'stu'],
            'B': ['abcdef', 'jklghi', 'mnopqr', 'vwxstu']
         })
dfA

And I want to get this dataframe :

dfB = pd.DataFrame({
            'A': ['abc','ghi','mno', 'stu'],
            'B': ['abcdef', 'jklghi', 'mnopqr', 'vwxstu'],
            'C': ['def', 'jkl', 'pqr', 'vwx'],
         })
dfB

The column ‘C’ must contains the substrings of the column ‘B’ that is not in the strings in column ‘A’.

I tried to copy column ‘B’ into ‘C’ and then use df.replace() as shown below, but it doesn’t work :

dfA = pd.DataFrame({
            'A': ['abc','ghi','mno', 'stu'],
            'B': ['abcdef', 'jklghi', 'mnopqr', 'vwxstu']
         })
dfA.loc[:,'C'] = dfA['B']

dfA['C'].replace(dfA['B'], '', regex=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16112\1271772080.py in <cell line: 7>()
      5 dfA.loc[:,'C'] = dfA['B']
      6 
----> 7 dfA['C'].replace(dfA['B'], '', regex=True)

~\Anaconda3\envs\py310\lib\site-packages\pandas\core\series.py in replace(self, to_replace, value, inplace, limit, regex, method)
   4958         method: str | lib.NoDefault = lib.no_default,
   4959     ):
-> 4960         return super().replace(
   4961             to_replace=to_replace,
   4962             value=value,

~\Anaconda3\envs\py310\lib\site-packages\pandas\core\generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6677                     # Operate column-wise
   6678                     if self.ndim == 1:
-> 6679                         raise ValueError(
   6680                             "Series.replace cannot use dict-like to_replace "
   6681                             "and non-None value"

ValueError: Series.replace cannot use dict-like to_replace and non-None value

Also, the strings in ‘A’ must be the pre/sufix of the ‘B’ column, so the ‘C’ column will be the su/prefix of ‘B’ strings. So, 'B' = 'A'+'C' | 'C'+'A', I also tried to use - as a "decatenation" operator, but it doesn’t work.

Do you have any idea of how I should do this instead ?

>Solution :

You need to loop here.

You can use re.sub:

import re
dfA['C'] = [re.sub(a, '', b) for a,b in zip(dfA['A'], dfA['B'])]

or str.replace:

dfA['C'] = [b.replace(a, '') for a,b in zip(dfA['A'], dfA['B'])]

output:

     A       B    C
0  abc  abcdef  def
1  ghi  jklghi  jkl
2  mno  mnopqr  pqr
3  stu  vwxstu  vwx

byMR

Published May 18, 2022

Add a comment

How to overload + operator in a Template array class to add every element with the same Index together

byMR

May 18, 2022

Questions

React onClick event only triggers when rendering component

byMR

May 18, 2022

Questions

T-SQL iterative calculation without loop

byMR

May 18, 2022

Questions

How to split an array of objects in java or Kotlin?

byMR

May 18, 2022

Questions

argument type 'Future Function(int, String, String, String)' can't be assigned to the parameter type 'void Function(int, String?, String?, String?)?

byMR

May 18, 2022

Questions

Understanding broadcasting and arithmetic operations on different dimension tensors

byMR

May 18, 2022

Pandas – Remove part of string in column that is already in another column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to overload + operator in a Template array class to add every element with the same Index together

React onClick event only triggers when rendering component

T-SQL iterative calculation without loop

How to split an array of objects in java or Kotlin?

argument type 'Future Function(int, String, String, String)' can't be assigned to the parameter type 'void Function(int, String?, String?, String?)?

Understanding broadcasting and arithmetic operations on different dimension tensors

Keep Up to Date with the Most Important News

Pandas – Remove part of string in column that is already in another column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to overload + operator in a Template array class to add every element with the same Index together

React onClick event only triggers when rendering component

T-SQL iterative calculation without loop

How to split an array of objects in java or Kotlin?

argument type 'Future Function(int, String, String, String)' can't be assigned to the parameter type 'void Function(int, String?, String?, String?)?

Understanding broadcasting and arithmetic operations on different dimension tensors

Discover more from Dev solutions