I am trying to insert a STRING type column to an ARRAY of STRUCT TYPE column, but facing errors. Could you help to provide the right direction to do the INSERT.
In databricks notebook, I have a raw table (raw_lms.rawTable) where all the columns are string type. This needs to insert into a transform table (tl_lms.transformedTable) where the columns are array of struct type.
CREATE TABLE raw_lms.rawTable
( PrimaryOwners STRING
,Owners STRING
)
USING DELTA LOCATION 'xxxx/rawTable'
CREATE TABLE tl_lms.transformedTable
( PrimaryOwners array<struct<Id:STRING>>
,Owners array<struct<Id:STRING>>
)
USING DELTA LOCATION 'xxxx/transformedTable'
Raw table has the below values populated: Eg.
INSERT INTO TABLE raw_lms.rawTable
VALUES
("[{'Id': '1393fe1b-bba2-4343-dff0-08d9dea59a03'}, {'Id': 'cf2e6549-5d07-458c-9d30-08d9dd5885cf'}]",
"[]"
)
I try to insert to transform table and get the below error:
INSERT INTO tl_lms.transformedTable
SELECT PrimaryOwners,
Owners
FROM raw_lms.rawTable
Error in SQL statement: AnalysisException: cannot resolve
‘spark_catalog.raw_lms.rawTable.PrimaryOwners‘ due to data type
mismatch: cannot cast string to array<struct<Id:string>>;
I do not want to explode the data. I only need to simply insert row for a row between rawTable and transformedTable of different column data types.
Thanks for your time and help.
>Solution :
As the error messages states, you can’t insert a string as array. You need to use array and named_struct functions.
Change the type of raw table to correct type and types not strings and try this:
INSERT INTO TABLE raw_lms.rawTable
VALUES
(array(named_struct('id', '1393fe1b-bba2-4343-dff0-08d9dea59a03'), named_struct('id', 'cf2e6549-5d07-458c-9d30-08d9dd5885cf')),
null
);
Or if you want to keep columns as string in raw table, then use from_json to parse the strings into correct type before inserting:
INSERT INTO tl_lms.transformedTable
SELECT from_json(PrimaryOwners, 'array<struct<Id:STRING>>'),
from_json(Owners, 'array<struct<Id:STRING>>')
FROM raw_lms.rawTable
