I have following string:
sentence = "<nonderivativetransaction><securitytitle><value>Common Stock</value></securitytitle><transactiondate><value>2003-08-19</value></transactiondate><transactioncoding><transactionformtype>4</transactionformtype><transactioncode>S</transactioncode><equityswapinvolved>0</equityswapinvolved></transactioncoding><transactionamounts><transactionshares><value>100</value></transactionshares><\ntransactionPricePerShare><value>42.31</value><transactionacquireddisposedcode><value>D</value></transactionacquireddisposedcode></transactionamounts><posttransactionamounts><sharesownedfollowingtransaction><value>82291</value></sharesownedfollowingtransaction></posttransactionamounts><ownershipnature><directorindirectownership><value>D</value></directorindirectownership><natureofownership><value></value></natureofownership></ownershipnature></nonderivativetransaction>"
I would like to look for strings between "</transactionshares>" and "<transactionacquireddisposedcode>" by using:
import re
re.search("</transactionshares>(.*)<transactionacquireddisposedcode>", str(i))
But it outputs None, which is wrong.
Expected output:
<\ntransactionPricePerShare><value>42.31</value>
I tested the regex search string using regex101 and the output is correct: https://regex101.com/r/bVT6JU/1
I thought the search string "/" was the cause of this problem, therefore I tried using
<\/transactionshares>(.*)<transactionacquireddisposedcode>
Still I don’t get my expected output.
Both of search strings work on regex101.
Thank you for any help.
>Solution :
The \n is messing you up. adding flags=re.DOTALL will make your .* include the \n
<\ntransactionPricePerShare>
I also changed the .* to a .*? to make it less greedy, which you may want. That way it stops at the first <transactionacquireddisposedcode>
import re
sentence = "<nonderivativetransaction><securitytitle><value>Common Stock</value></securitytitle><transactiondate><value>2003-08-19</value></transactiondate><transactioncoding><transactionformtype>4</transactionformtype><transactioncode>S</transactioncode><equityswapinvolved>0</equityswapinvolved></transactioncoding><transactionamounts><transactionshares><value>100</value></transactionshares><\ntransactionPricePerShare><value>42.31</value><transactionacquireddisposedcode><value>D</value></transactionacquireddisposedcode></transactionamounts><posttransactionamounts><sharesownedfollowingtransaction><value>82291</value></sharesownedfollowingtransaction></posttransactionamounts><ownershipnature><directorindirectownership><value>D</value></directorindirectownership><natureofownership><value></value></natureofownership></ownershipnature></nonderivativetransaction>"
match = re.search("</transactionshares>(.*?)<transactionacquireddisposedcode>", sentence, flags=re.DOTALL).group(1)
print (match)