Perl 5 – regex alternation not returning first match

Advertisements

I’m trying to parse a automobile info text into a semicolon delimited file. The fields should be Year, Make, Model, Keyfob Description, FCC ID. The regex alternation should be able to separate the Model from the Keyfob Desc. I thought the alternation would use the first match, but is always choosing the word Remotewhen selecting, no matter it’s placement in the alternation. Why? How can I make it select Smart instead of Remote as the start of the group.
The text file I’m parsing:

2013 Chevrolet Tahoe Keyless Entry Remote Key Fob 6B w/ Hatch, Rear Glass, Remote Start (FCC: OUC60270 / OUC60221, P/N: 15913427)
2021 Jeep Grand Cherokee Smart Remote Key Fob 3B (FCC: M3N-40821302, P/N: 68143502AA)
2010 Acura TL Smart Remote Key Fob 4B w/ Trunk (FCC: M3N5WY8145, P/N: 72147-TK4-A71)
2006 Mazda 5 Remote Flip Key Fob 3B (FCC: BGBX1T478SKE125-01, P/N: CC43-67-5RYC)

The regular expression:

if( $row =~ /^(\d+)\s(\w+)\s(\w.*)\s(Smart|Keyless|KEYLESS|Remote)(\s\w+.*)\s\(FCC\:\s(.+)\,\s.+/ ){ print "$1;$2;$3;$4$5;$6\n" };

The result, where it keeps choosing Remote as the best option in the alternation:

2017;Chevrolet;Cruze;Remote Flip Key Fob 4B w/ Trunk;LXP-T004 (XL8 Model)
2013;Chevrolet;Tahoe Keyless Entry Remote Key Fob 6B w/ Hatch, Rear Glass,;Remote Start;OUC60270 / OUC60221
2021;Jeep;Grand Cherokee Smart;Remote Key Fob 3B;M3N-40821302
2010;Acura;TL Smart;Remote Key Fob 4B w/ Trunk;M3N5WY8145
2006;Mazda;5;Remote Flip Key Fob 3B;BGBX1T478SKE125-01

Is the regex selecting Remote instead of Smart because it is longer and there’s multiple matches? How can I make it select the first match instead of longest when there is multiple matching in a line?

I’ve tried rearranging the words in the alternation, but it always chooses Remote.

>Solution :

The .* tries to match the longest substring possible. Use the ? character to change it to match the minimal number of characters:

/^(\d+)\s(\w+)\s(\w.*?)\s(Smart|Keyless|KEYLESS|Remote)(\s\w+.*)\s\(FCC\:\s(.+)\,\s.+/
#                    ~

Leave a ReplyCancel reply