Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Preg match html language with lang attribute php

I would like to use preg_match in PHP to parse the site lang out of a html document;

My preg_match:

$sitelang = preg_match('!<html lang="(.*?)">!i', $result, $matches) ? $matches[1] : 'Site Language not detected';

When I have a simple attribute without any class or ids.
For example:
Input:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

<html lang="de">

Output:

de

But when I have a other html code like this:
Input:

<html lang="en" class="desktop-view not-mobile-device text-size-normal anon">

Output:

en " class=" desktop - view not - mobile - device text - size - normal anon,

I need just the lang code(en, de, en-En, de-DE).

Thanks for your advice or code.

>Solution :

Standard disclaimer of using regex to parse HTML aside, there are two things you likely want. First, get rid of the closing bracket in your pattern. Once you have the close quote, the rest of the line doesn’t matter. Second, make sure what’s inside the quotes doesn’t itself contain quotes.

Current, open quote, then anything, then close quote:

preg_match('!<html lang="(.*?)">!i', $result, $matches)

This means if you have lang="foo" class="bar" you get foo" class="bar as a match because regex is greedy and that whole string could be considered to be inside the two separate sets of outermost quotes.

New, inside the quotes, one or more of anything but a quote:

preg_match('!<html lang="([^"]+)"!i', $result, $matches)

If you want to be more resilient, change the hard space to one or more whitespace chars:

preg_match('!<html\s+lang="([^"]+)"!i', $result, $matches)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading