I’m currently migrating a very old typo3 cms to wordpress 6.1. In order to get links stored in the db working I have to transform them from a proprietary non-xml format to html using a regular expression:
<link fileadmin/user_upload/file47.pdf - download>Linktext</link>
has to transformed with this regex
<link\s(fileadmin\S*?)[>\s].+?>(.+?)<\/link>
to this link:
<a href="fileadmin/user_upload/file47.pdf">Linktext</a>
I’ve tested the regular expression here https://regex101.com/r/cF5ZVR/1 – it works, and I’m also able to use it in php 7.4:
php > $pattern = '~<link\s(fileadmin\S*?)[>\s].+?>(.+?)<\/link>~';
php > $replacement = '<a href="$1">$2</a>';
php > $string = '<link fileadmin/user_upload/file47.pdf - download>Linktext</link>';
php > echo preg_replace($pattern, $replacement, $string);
<a href="fileadmin/user_upload/file47.pdf">Linktext</a>
But when I try to do it with a sql statement in MariaDB 10.3.38 I don’t get any result:
select regexp_replace('<link fileadmin/user_upload/file47.pdf - download>Linktext</link>', '<link\s(fileadmin\S*?)[>\s].+?>(.+?)<\/link>', '<a href="\\1">\\2</a>') as replaced;
+-------------------------------------------------------------------+
| replaced |
+-------------------------------------------------------------------+
| <link fileadmin/user_upload/file47.pdf - download>Linktext</link> |
+-------------------------------------------------------------------+
1 row in set (0,000 sec)
I’ve checked the regex grammar of MariaDB: https://mariadb.com/kb/en/pcre/ and everything seems to look fine.
Can anyone give me a hint what I’m missing here? Thanks!
>Solution :
You select should be:
select regexp_replace('<link fileadmin/user_upload/file47.pdf - download>Linktext</link>', '<link\\s(fileadmin\\S*?)[>\\s].+?>(.+?)<\\/link>', '<a href="\\1">\\2</a>') as replaced;
You have to escape \
symbols in you regexp. You have done it in replacement string, but missed in regex itself.
From MariaDB KB on regex:
Note: Because MariaDB uses the C escape syntax in strings (for example,
\n
to represent the newline character), you must double any\
that you use in your REGEXP strings.