Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split a string text by element class using php and keeping all original text inside array

i’d like to split a long text into chunks. I need to split by element class (element can be h, p, span, div or others unknown tag).
So, for example, if I got a string like:

$string = 'Hi this is a long <span class="cut">string</span> and I need to <span class="cut">split it into chunks</span> and I need help for <span class="cut">this</span>';

I’d like to split by cut class, into array, keeping all texts:
Expected result:

$array(
   0 => 'Hi this is a long ',
   1 => '<span class="cut">string</span>',
   2 => ' and I need to ',
   3 => '<span class="cut">split it into chunks</span>',
   4 => ' and I need help for ',
   5 => '<span class="cut">this</span>'
);

I don’t find any example on the web.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I find only this one by it find only elements by class and exclude all other text and I don’t know if it is usefull for my purpose:

 $domdocument = new DOMDocument();
 $domdocument->loadHTML($contenuto);
 $a = new DOMXPath($domdocument);
 $elements = $a->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' cut')]");

 for ($i = $elements->length - 1; $i > -1; $i--) {
    var_dump($elements->item($i)->firstChild->nodeValue);
 }

>Solution :

We can try a preg_match_all regex match all approach here:

$string = 'Hi this is a long <span class="cut">string</span> and I need to <span class="cut">split it into chunks</span> and I need help for <span class="cut">this</span>';
preg_match_all("/<(\w+).*?>.*?<\/\\1>|.*?(?=<|$)/", $string, $matches);
$lines = $matches[0];
array_pop($lines);
print_r($lines);

This prints:

Array
(
    [0] => Hi this is a long 
    [1] => <span class="cut">string</span>
    [2] =>  and I need to 
    [3] => <span class="cut">split it into chunks</span>
    [4] =>  and I need help for 
    [5] => <span class="cut">this</span>
)

The regex pattern used here says to match:

<(\w+).*?>  an HTML tag
.*?         any content
<\/\\1>     closing tag
|           OR
.*?         any other content until reaching, but not including
(?=<|$)     the next HTML tag or the end of the input
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading