I am working on Databricks SQL and trying to parse strings into separate columns.
Here is an example that is working for me:
regexp_extract('100-200', '(\\d+)-(\\d+)', 1)
returns: 100
My specific example is splitting a column called match_name (i.e Detroit Lions at New York Jets) so I can get home and away team into their own separate columns (i.e AWAY = Detroit Lions, HOME = New York Jets)
Here is some code I have tried to implement against the match_name column in my SQL query to attempt to split at the word "at" – none of it currently works
split(match_name,'[at]', 1),
regexp_extract(match_name,'(\\d+)at(\\d+)', 1),
Any help is appreciated!
>Solution :
I’ve run into this frustration with Databricks as well, as I thought split_part was a valid function. At any rate, here’s one way to use regexp_extract…
with my_table as (
select 'Team A at Team B' as game union all
select 'Atlanta at Dallas'
)
select game,
regexp_extract(game, "(.*) at (.*)", 1) as away_team,
regexp_extract(game, "(.*) at (.*)", 2) as home_team
from my_table
Output:
game | away_team | home_team |
---|---|---|
Team A at Team B | Team A | Team B |
Atlanta at Dallas | Atlanta | Dallas |