Problem with Python, Regex search and string "/"

June 16, 2022

I have following string:

sentence = "<nonderivativetransaction><securitytitle><value>Common Stock</value></securitytitle><transactiondate><value>2003-08-19</value></transactiondate><transactioncoding><transactionformtype>4</transactionformtype><transactioncode>S</transactioncode><equityswapinvolved>0</equityswapinvolved></transactioncoding><transactionamounts><transactionshares><value>100</value></transactionshares>&lt;\ntransactionPricePerShare&gt;<value>42.31</value><transactionacquireddisposedcode><value>D</value></transactionacquireddisposedcode></transactionamounts><posttransactionamounts><sharesownedfollowingtransaction><value>82291</value></sharesownedfollowingtransaction></posttransactionamounts><ownershipnature><directorindirectownership><value>D</value></directorindirectownership><natureofownership><value></value></natureofownership></ownershipnature></nonderivativetransaction>"

I would like to look for strings between "</transactionshares>" and "<transactionacquireddisposedcode>" by using:

import re
re.search("</transactionshares>(.*)<transactionacquireddisposedcode>", str(i))

But it outputs None, which is wrong.
Expected output:

&lt;\ntransactionPricePerShare&gt;<value>42.31</value>

I tested the regex search string using regex101 and the output is correct: https://regex101.com/r/bVT6JU/1

I thought the search string "/" was the cause of this problem, therefore I tried using

<\/transactionshares>(.*)<transactionacquireddisposedcode>

Still I don’t get my expected output.
Both of search strings work on regex101.

Thank you for any help.

>Solution :

The \n is messing you up. adding flags=re.DOTALL will make your .* include the \n

<\ntransactionPricePerShare>

I also changed the .* to a .*? to make it less greedy, which you may want. That way it stops at the first <transactionacquireddisposedcode>

import re

sentence = "<nonderivativetransaction><securitytitle><value>Common Stock</value></securitytitle><transactiondate><value>2003-08-19</value></transactiondate><transactioncoding><transactionformtype>4</transactionformtype><transactioncode>S</transactioncode><equityswapinvolved>0</equityswapinvolved></transactioncoding><transactionamounts><transactionshares><value>100</value></transactionshares>&lt;\ntransactionPricePerShare&gt;<value>42.31</value><transactionacquireddisposedcode><value>D</value></transactionacquireddisposedcode></transactionamounts><posttransactionamounts><sharesownedfollowingtransaction><value>82291</value></sharesownedfollowingtransaction></posttransactionamounts><ownershipnature><directorindirectownership><value>D</value></directorindirectownership><natureofownership><value></value></natureofownership></ownershipnature></nonderivativetransaction>"

match = re.search("</transactionshares>(.*?)<transactionacquireddisposedcode>", sentence, flags=re.DOTALL).group(1)

print (match)