Regex working in php 7.4 but not with mariadb 10.3.38 (though both use pcre)

Advertisements

I’m currently migrating a very old typo3 cms to wordpress 6.1. In order to get links stored in the db working I have to transform them from a proprietary non-xml format to html using a regular expression:

<link fileadmin/user_upload/file47.pdf - download>Linktext</link>

has to transformed with this regex

<link\s(fileadmin\S*?)[>\s].+?>(.+?)<\/link>

to this link:

<a href="fileadmin/user_upload/file47.pdf">Linktext</a>

I’ve tested the regular expression here https://regex101.com/r/cF5ZVR/1 – it works, and I’m also able to use it in php 7.4:

php > $pattern = '~<link\s(fileadmin\S*?)[>\s].+?>(.+?)<\/link>~';
php > $replacement = '<a href="$1">$2</a>';
php > $string = '<link fileadmin/user_upload/file47.pdf - download>Linktext</link>';
php > echo preg_replace($pattern, $replacement, $string);
<a href="fileadmin/user_upload/file47.pdf">Linktext</a>

But when I try to do it with a sql statement in MariaDB 10.3.38 I don’t get any result:

select regexp_replace('<link fileadmin/user_upload/file47.pdf - download>Linktext</link>', '<link\s(fileadmin\S*?)[>\s].+?>(.+?)<\/link>', '<a href="\\1">\\2</a>') as replaced;
+-------------------------------------------------------------------+
| replaced                                                          |
+-------------------------------------------------------------------+
| <link fileadmin/user_upload/file47.pdf - download>Linktext</link> |
+-------------------------------------------------------------------+
1 row in set (0,000 sec)

I’ve checked the regex grammar of MariaDB: https://mariadb.com/kb/en/pcre/ and everything seems to look fine.

Can anyone give me a hint what I’m missing here? Thanks!

>Solution :

You select should be:

select regexp_replace('<link fileadmin/user_upload/file47.pdf - download>Linktext</link>', '<link\\s(fileadmin\\S*?)[>\\s].+?>(.+?)<\\/link>', '<a href="\\1">\\2</a>') as replaced;

You have to escape \ symbols in you regexp. You have done it in replacement string, but missed in regex itself.

From MariaDB KB on regex:

Note: Because MariaDB uses the C escape syntax in strings (for example, \n to represent the newline character), you must double any \ that you use in your REGEXP strings.

Leave a ReplyCancel reply