I’m trying to extract the ID of a Notion database from a URL e.g. the bold text in https://www.notion.so/anotioneer/d77d1d19d4a943358898f2be65499d6a?v=1dedd49c5403489ebb899a290111f858.
I can match everything after anotioneer/ with anotioneer\/(.+) and everything before the ? with .*(?=\?) but I’m struggling to combine the two expressions.
>Solution :
Would something like this work?
anotioneer\/(\w+)(?:$|\?)
- Start with the last segment of the path:
anotioneer\/. - Take one or more alphanumeric characters as a group:
(\w+). - Match either the end of the line or the query string in a non-capturing group:
(?:$|\?).
Here’s the valid sample data I used:
/anotioneer/d77d1d19d4a943358898f2be65499d6a?v=1dedd49c5403489ebb899a290111f858
/anotioneer/d79d1d19d4a943358898f2be65499d6a?v=3dedd49c5403489ebb899a290111f858
/anotioneer/d80d1d19d4a943358898f2be65499d6a?v=4dedd49c5403489ebb899a290111f858&t=123
One that doesn’t match because there’s an extra path segment between anotioneeer and the ID:
/anotioneer/foo/d78d1d19d4a943358898f2be65499d6a?v=2dedd49c5403489ebb899a290111f858
And one that doesn’t match because there’s an extra path segment after the ID:
/anotioneer/d81d1d19d4a943358898f2be65499d6a/foo
Here’s what the matches look like using this pattern. Note that you’ll want to take the first group, not the whole match. That’s why we used a non-capturing group for the end-of-line or query string segment.
| Part | Location | Contents |
|---|---|---|
| Match 1 | 22-66 | anotioneer/d77d1d19d4a943358898f2be65499d6a? |
| Group 1 | 33-65 | d77d1d19d4a943358898f2be65499d6a |
| — | — | — |
| Match 2 | 123-167 | anotioneer/d79d1d19d4a943358898f2be65499d6a? |
| Group 1 | 134-166 | d79d1d19d4a943358898f2be65499d6a |
| — | — | — |
| Match 3 | 224-268 | anotioneer/d80d1d19d4a943358898f2be65499d6a? |
| Group 1 | 235-267 | d80d1d19d4a943358898f2be65499d6a |