Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Databricks SQL regex_extract/split field before and after specific wordword

I am working on Databricks SQL and trying to parse strings into separate columns.

Here is an example that is working for me:

regexp_extract('100-200', '(\\d+)-(\\d+)', 1)
returns: 100

My specific example is splitting a column called match_name (i.e Detroit Lions at New York Jets) so I can get home and away team into their own separate columns (i.e AWAY = Detroit Lions, HOME = New York Jets)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here is some code I have tried to implement against the match_name column in my SQL query to attempt to split at the word "at" – none of it currently works

split(match_name,'[at]', 1),
regexp_extract(match_name,'(\\d+)at(\\d+)', 1),

Any help is appreciated!

>Solution :

I’ve run into this frustration with Databricks as well, as I thought split_part was a valid function. At any rate, here’s one way to use regexp_extract…

with my_table as (
select 'Team A at Team B' as game union all
select 'Atlanta at Dallas'
)
select game, 
 regexp_extract(game, "(.*) at (.*)", 1) as away_team, 
 regexp_extract(game, "(.*) at (.*)", 2) as home_team
from my_table

Output:

game away_team home_team
Team A at Team B Team A Team B
Atlanta at Dallas Atlanta Dallas
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading