How to remove empty words from my SQL string_to_array?

June 20, 2022

I have a text column in my Postgres table and I want to remove any non-alphabetical characters and split it by a space (so I can use each word to do a search later).

I can remove the characters and split it successfully however my rows have empty results. I cannot have any empty words:

SELECT
    asset.id,
    string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z ]', '', 'g')), ' ')
FROM
    assets asset

eg.
Hello world! becomes {Hello,world}
but also Some Result V1.0 - 3D Model becomes {Some,Result,V,,D,Model

How do I filter my array to remove empty words?

>Solution :

Try using a multicharacter regular expression

Try with this:

SELECT
    asset.id,
    string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z]+', ' ', 'g')), ' ')
FROM
    assets asset

I just removed the whitespace from the regex in order to capture every non-alphabet character. Then, the + indicates one or more coincidences, which ensures than any consecutive non-alphabetical character will get replaced with a single space. Finally, as you already do the trim, your split by whitespace will work.