Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to replace part of string using regular expression?

I have a dataframe like this:

fict={'well':['10B23','10B23','10B23','10B23','10B23','10B23'],
      'tag':['15B22|TestSep_OutletFlow','15B22|TestSep_GasOutletFlow','15B22|TestSep_WellNum','15B22|TestSep_GasPresValve','15B22|TestSep_Temp','WHT']}
df=pd.DataFrame(dict)
df

    well    tag
0   10B23   15B22|TestSep_OutletFlow
1   10B23   15B22|TestSep_GasOutletFlow
2   10B23   15B22|TestSep_WellNum
3   10B23   15B22|TestSep_GasPresValve
4   10B23   15B22|TestSep_Temp
5   10B23   WHT

Now I’d like to replace anything before | in column of tag to a string like 11A22, so the dataframe after replace should look like this:

well    tag
0   10B23   11A22|TestSep_OutletFlow
1   10B23   11A22|TestSep_GasOutletFlow
2   10B23   11A22|TestSep_WellNum
3   10B23   11A22|TestSep_GasPresValve
4   10B23   11A22|TestSep_Temp
5   10B23   WHT

I am thinking to use regular expression with group to replace group by a string, something in my mind look like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['tag2']=df['tag'].str.replace(r'([a-z0-9]*)|TestSep_[a-z0-9]*','11A22',regex=True)

then i got result of

well    tag tag2
0   10B23   15B22|TestSep_OutletFlow    11A2211A22B11A2211A22|11A2211A2211A22O11A2211A...
1   10B23   15B22|TestSep_GasOutletFlow 11A2211A22B11A2211A22|11A2211A2211A22G11A2211A...
2   10B23   15B22|TestSep_WellNum   11A2211A22B11A2211A22|11A2211A2211A22W11A2211A...
3   10B23   15B22|TestSep_GasPresValve  11A2211A22B11A2211A22|11A2211A2211A22G11A2211A...
4   10B23   15B22|TestSep_Temp  11A2211A22B11A2211A22|11A2211A2211A22T11A2211A22
5   10B23   WHT 11A22W11A22H11A22T11A22

Thanks for your help

>Solution :

(|) is a special character in regex, you need to escape it.

df["tag2"] = df["tag"].str.replace(r"^\w*\|", "11A22|", regex=True)


Output :

print(df)

    well                          tag                         tag2
0  10B23     15B22|TestSep_OutletFlow     11A22|TestSep_OutletFlow
1  10B23  15B22|TestSep_GasOutletFlow  11A22|TestSep_GasOutletFlow
2  10B23        15B22|TestSep_WellNum        11A22|TestSep_WellNum
3  10B23   15B22|TestSep_GasPresValve   11A22|TestSep_GasPresValve
4  10B23           15B22|TestSep_Temp           11A22|TestSep_Temp
5  10B23                          WHT                          WHT
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading