I’m trying to parse a automobile info text into a semicolon delimited file. The fields should be Year, Make, Model, Keyfob Description, FCC ID. The regex alternation should be able to separate the Model from the Keyfob Desc. I thought the alternation would use the first match, but is always choosing the word Remote
when selecting, no matter it’s placement in the alternation. Why? How can I make it select Smart
instead of Remote
as the start of the group.
The text file I’m parsing:
2013 Chevrolet Tahoe Keyless Entry Remote Key Fob 6B w/ Hatch, Rear Glass, Remote Start (FCC: OUC60270 / OUC60221, P/N: 15913427)
2021 Jeep Grand Cherokee Smart Remote Key Fob 3B (FCC: M3N-40821302, P/N: 68143502AA)
2010 Acura TL Smart Remote Key Fob 4B w/ Trunk (FCC: M3N5WY8145, P/N: 72147-TK4-A71)
2006 Mazda 5 Remote Flip Key Fob 3B (FCC: BGBX1T478SKE125-01, P/N: CC43-67-5RYC)
The regular expression:
if( $row =~ /^(\d+)\s(\w+)\s(\w.*)\s(Smart|Keyless|KEYLESS|Remote)(\s\w+.*)\s\(FCC\:\s(.+)\,\s.+/ ){ print "$1;$2;$3;$4$5;$6\n" };
The result, where it keeps choosing Remote
as the best option in the alternation:
2017;Chevrolet;Cruze;Remote Flip Key Fob 4B w/ Trunk;LXP-T004 (XL8 Model)
2013;Chevrolet;Tahoe Keyless Entry Remote Key Fob 6B w/ Hatch, Rear Glass,;Remote Start;OUC60270 / OUC60221
2021;Jeep;Grand Cherokee Smart;Remote Key Fob 3B;M3N-40821302
2010;Acura;TL Smart;Remote Key Fob 4B w/ Trunk;M3N5WY8145
2006;Mazda;5;Remote Flip Key Fob 3B;BGBX1T478SKE125-01
Is the regex selecting Remote
instead of Smart
because it is longer and there’s multiple matches? How can I make it select the first match instead of longest when there is multiple matching in a line?
I’ve tried rearranging the words in the alternation, but it always chooses Remote
.
>Solution :
The .*
tries to match the longest substring possible. Use the ?
character to change it to match the minimal number of characters:
/^(\d+)\s(\w+)\s(\w.*?)\s(Smart|Keyless|KEYLESS|Remote)(\s\w+.*)\s\(FCC\:\s(.+)\,\s.+/
# ~