Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

trying to split a comma separated string ignoring quotes and brackets

I’m trying to split a text into comma separated groups, except when the comma is in double or single quotes, or in brackets.

e.g.

  1. a,b=456 should find a and b=345,
  2. a='123,456',b should find a='123,456' and b
  3. a=x(1,2,3),b,c should find a=x(1,2,3) and b and c

I have tried str_getcsv and some preg_split but I can’t seem to get the right pattern.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Using the following code

function test($n, $a,$b) {
    echo "Test $n";
    if ( $a===$b ) echo "=<span style='color:green'>CORRECT ************************</span>";
    else echo "=<span style='color:red'>WRONG</span>";
    echo "<PRE>".print_r($b, true)."</PRE>";
    echo "<HR>\n";
}

$t=    'lorem ipsum=123,delor=\'1,456\',sit="123,456",amet=xxx(2,3),"consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."';
$want=["lorem ipsum=123","delor='1,456'","sit=\"123,456\"","amet=xxx(2,3)","consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."];

and

echo "WANTED.<PRE style='color:green'>".print_r($want, true)."</PRE><HR>";
//Array
//(
//    [0] => lorem ipsum=123
//    [1] => delor='1,456'
//    [2] => sit="123,456"
//    [3] => amet=xxx(2,3)
//    [4] => consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
//)


test("1 explode", $want, explode(",", $t));
// Test 1 explode=WRONG
// Array
// (
//     [0] => lorem ipsum=123
//     [1] => delor='1
//     [2] => 456'
//     [3] => sit="123
//     [4] => 456"
//     [5] => amet=xxx(2
//     [6] => 3)
//     [7] => "consectetur adipiscing elit
//     [8] =>  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
// )


test("2 str_getcsv", $want, str_getcsv($t, ",", "'"));
// Test 2 str_getcsv=WRONG
// Array
// (
//     [0] => lorem ipsum=123
//     [1] => delor='1
//     [2] => 456'
//     [3] => sit="123
//     [4] => 456"
//     [5] => amet=xxx(2
//     [6] => 3)
//     [7] => "consectetur adipiscing elit
//     [8] =>  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
// )


test("3 str_getcsv", $want, str_getcsv($t, ",", "\""));
// Test 2 str_getcsv=WRONG
// Array
// (
//     [0] => lorem ipsum=123
//     [1] => delor='1
//     [2] => 456'
//     [3] => sit="123
//     [4] => 456"
//     [5] => amet=xxx(2
//     [6] => 3)
//     [7] => "consectetur adipiscing elit
//     [8] =>  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
// )    


test("4 preg_split", $want, preg_split("/,/", $t));
// Test 4 preg_split=WRONG
// Array
// (
//     [0] => lorem ipsum=123
//     [1] => delor='1
//     [2] => 456'
//     [3] => sit="123
//     [4] => 456"
//     [5] => amet=xxx(2
//     [6] => 3)
//     [7] => "consectetur adipiscing elit
//     [8] =>  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
// )

I’ve lost a huge amount of time searching and trying different patterns – I’m sure I would have written a string parser quicker than this – but perhaps – can someone give me a good pattern to work through this?

I’ve put a sample test on https://onlinephp.io/c/3f4d3 to run this code

Thanks

>Solution :

I suggest using

preg_match_all('~(?:\'[^\']*\'|"[^"]*"|(\((?:[^()]++|(?1))*\))|[^\'",])+~', $text, $matches)

Or, if there can be escape sequences inside the quoted substrings:

preg_match_all('~(?:\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'|"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|(\((?:[^()]++|(?1))*\))|[^\'",])+~s', $text, $matches)

See the regex demo.

Details:

  • (?: – start of a non-capturing group (acting as a container here):
    • \'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'| – a string between single quotes with escape sequences support, or
    • "[^"\\\\]*(?:\\\\.[^"\\\\]*)*"| – a string between double quotes with escape sequences support, or
    • (\((?:[^()]++|(?1))*\))| – a string between two paired nested parentheses
    • [^\'",] – a char other than ', " and ,
  • )+ – one or more sequences.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading