Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Perl 5 – regex alternation not returning first match

I’m trying to parse a automobile info text into a semicolon delimited file. The fields should be Year, Make, Model, Keyfob Description, FCC ID. The regex alternation should be able to separate the Model from the Keyfob Desc. I thought the alternation would use the first match, but is always choosing the word Remotewhen selecting, no matter it’s placement in the alternation. Why? How can I make it select Smart instead of Remote as the start of the group.
The text file I’m parsing:

2013 Chevrolet Tahoe Keyless Entry Remote Key Fob 6B w/ Hatch, Rear Glass, Remote Start (FCC: OUC60270 / OUC60221, P/N: 15913427)
2021 Jeep Grand Cherokee Smart Remote Key Fob 3B (FCC: M3N-40821302, P/N: 68143502AA)
2010 Acura TL Smart Remote Key Fob 4B w/ Trunk (FCC: M3N5WY8145, P/N: 72147-TK4-A71)
2006 Mazda 5 Remote Flip Key Fob 3B (FCC: BGBX1T478SKE125-01, P/N: CC43-67-5RYC)

The regular expression:

if( $row =~ /^(\d+)\s(\w+)\s(\w.*)\s(Smart|Keyless|KEYLESS|Remote)(\s\w+.*)\s\(FCC\:\s(.+)\,\s.+/ ){ print "$1;$2;$3;$4$5;$6\n" };

The result, where it keeps choosing Remote as the best option in the alternation:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

2017;Chevrolet;Cruze;Remote Flip Key Fob 4B w/ Trunk;LXP-T004 (XL8 Model)
2013;Chevrolet;Tahoe Keyless Entry Remote Key Fob 6B w/ Hatch, Rear Glass,;Remote Start;OUC60270 / OUC60221
2021;Jeep;Grand Cherokee Smart;Remote Key Fob 3B;M3N-40821302
2010;Acura;TL Smart;Remote Key Fob 4B w/ Trunk;M3N5WY8145
2006;Mazda;5;Remote Flip Key Fob 3B;BGBX1T478SKE125-01

Is the regex selecting Remote instead of Smart because it is longer and there’s multiple matches? How can I make it select the first match instead of longest when there is multiple matching in a line?

I’ve tried rearranging the words in the alternation, but it always chooses Remote.

>Solution :

The .* tries to match the longest substring possible. Use the ? character to change it to match the minimal number of characters:

/^(\d+)\s(\w+)\s(\w.*?)\s(Smart|Keyless|KEYLESS|Remote)(\s\w+.*)\s\(FCC\:\s(.+)\,\s.+/
#                    ~
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading