So, I’ve been trying to create a program that converts from romaji (romanization of japanese) to the hiragana alphabet. I would like to match the first letters of ‘kk’, ‘ss’, ‘tt’, and ‘pp’.
My attempt:
re.sub(r'(?=([kstp]))\1', 'っ', string)
I expect 'tooka' to output 'tooka' and 'yokka' should output 'yoっka', but my regex appears to just be matching [kstp].
Is there an easy way I can fix this?
>Solution :
You put positive-lookahead (?=...) in the wrong position. Try:
import re
lst = ['tooka', 'yokka', 'chotto', 'koppu']
print([re.sub(r'([kstp])(?=\1)', 'っ', s) for s in lst])
# ['tooka', 'yoっka', 'choっto', 'koっpu']
Or a simpler one re.sub(r'([kstp])\1', r'っ\1', s) works too.