Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Google Data Studio REGXP remove text between two "_"'s

I have a string that is set up as date_type_campaignname_audience_timeofday. I would like to set up 5 regexp expressions to pull out each substring.

I tried this to pull out the type and it instead pull the audience: REGEXP_EXTRACT(field_name, r"(.*?)_([^_]*)_").

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Match ignored fragments, then capture the fragment you need:

REGEXP_EXTRACT(field_name, '^(?:[^_]+_){0}([^_]+)')  // date
REGEXP_EXTRACT(field_name, '^(?:[^_]+_){1}([^_]+)')  // time
REGEXP_EXTRACT(field_name, '^(?:[^_]+_){2}([^_]+)')  // campaignname
REGEXP_EXTRACT(field_name, '^(?:[^_]+_){3}([^_]+)')  // audience
REGEXP_EXTRACT(field_name, '^(?:[^_]+_){4}([^_]+)')  // timeofday

Try it on regex101.com. (Increase/decrease the number to see what the matches look like.)

[^_]+ matches a fragment consisting of 1 or more non-underscore characters.
(?:[^_]+_){3} matches three times of such a fragment followed by an underscore (e.g. a_b_c_). ^ means "at the start".

Collectively, these regexes match the first, second, third, fourth and fifth fragments of an underscore-separated string.

Credit goes to the author of this answer.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading