Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex to match any number of words if any of the words have a specific character

I have a Python list of several cities and I’m trying to match only those that have certain accented characters. I managed to get a lot of them, but not all (it fails at ‘El Fuerte de la Unión’ for example.

I also feel that my syntax could be more efficient, I’m adding word and whitespace characters, but there must be a better way. I’m not sure how to construct a search that would take into account that there may or may not be multiple words or spaces before a word with required characters is matched.

This is my syntax:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

'(\w*\s*\w*\s*\w*[áíúóéó]+.*?\s*\w*)'

This is a portion of the list:

'Name', 'San Marcos Tecomaxusco', 'Teapa', 'Tatatila', 'Sisal', 'Simojovel de Allende', 'El Fuerte de la Unión', 'Santiago Zoochila', 'Santiago Nuyoó', 'Miahuatlán', 'Santa María Huazolotitlán', 'Santa María Chimalhuacán', 'Santa María Apazco', 'Santa Cruz Ozolotepec', 'San Simón', 'Huixcolotla', 'San Rafael Ixtapalucan', 'San Pedro Mixtepec', 'San Pedro Huilotepec', 'San Miguel Balderas', 'San Mateo Almomoloha', 'San Martín Chalchicuautla', 'Teolocholco', 'San Luis Ayucán', 'San Juan Zitlaltepec', 'San José de las Flores', 'San Jerónimo Xayacatlán', 'San Hipólito', 'San Francisco Oxtotilpan', 'San Cristóbal de las Casas'

Here is a link to regex101 with the full list: https://regex101.com/r/6WlY8o/1

>Solution :

You can try using the following regex:

'[^']*[áíúóéó][^']*'

Regex Explanation:

  • ': single quote
  • [^']*: any non-single quote
  • [áíúóéó]: one accented character
  • [^']*: any non-single quote
  • ': single quote

If you don’t want to match single quotes, you can add lookarounds:

(?<=')[^']*[áíúóéó][^']*(?=')

Check the demo here.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading