Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regular Expression for Japanese Full-Width Numbers Returning All Full Width Characters

I am writing a PHP file that takes the contents of a web page, filters for full-width numbers, and converts them to half-width. Currently, my program returns all full-width characters on the page, not just the numbers.

<?php
$fullwidthPattern = '/([0-9])/';

$handle = curl_init();
 
$url = (URL removed for privacy reasons);

function getFullWidth(string $input) {
    global $fullwidthPattern;
    return preg_match($fullwidthPattern, $input);
}

curl_setopt($handle, CURLOPT_URL, $url);

curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
 
$output = curl_exec($handle);
 
curl_close($handle);

function jp_str_split($str) {
    $pattern = '/(?<!^)(?!$)/u';
    return preg_split($pattern,$str);
}

$jpContents = jp_str_split($output);

$numbers = array_filter($jpContents, 'getFullWidth');

foreach($numbers as $x) {
    echo $x;
}

My regular expression is currently ‘/([0-9])/’, but I have also tried ‘/[0-9]/’ and ‘/[0123456789]/’.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Splitting should be done with

function jp_str_split($str) {
    preg_match_all('/\X/u', $str, $matches);
    return $matches[0]; 
}

The \X construct matches any Unicode grapheme in full, your (?<!^)(?!$) regex matches any location inside the string, even between bytes regardless of the u flag presence (it affects the chars you consume and not the locations inside the matched string).

Also, since you process Unicode numbers, you must also pass the u flag in the second regex:

$fullwidthPattern = '/([0-9])/u';
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading