spark sql Insert string column to struct of array type column

January 31, 2022

I am trying to insert a STRING type column to an ARRAY of STRUCT TYPE column, but facing errors. Could you help to provide the right direction to do the INSERT.

In databricks notebook, I have a raw table (raw_lms.rawTable) where all the columns are string type. This needs to insert into a transform table (tl_lms.transformedTable) where the columns are array of struct type.

CREATE TABLE raw_lms.rawTable
 (  PrimaryOwners STRING
  ,Owners STRING
 )
 USING DELTA LOCATION 'xxxx/rawTable'

CREATE TABLE tl_lms.transformedTable
 (  PrimaryOwners array<struct<Id:STRING>>
  ,Owners array<struct<Id:STRING>>
 )
 USING DELTA LOCATION 'xxxx/transformedTable'

Raw table has the below values populated: Eg.

INSERT INTO TABLE raw_lms.rawTable
VALUES
("[{'Id': '1393fe1b-bba2-4343-dff0-08d9dea59a03'}, {'Id': 'cf2e6549-5d07-458c-9d30-08d9dd5885cf'}]",
 "[]"
)

I try to insert to transform table and get the below error:

INSERT INTO tl_lms.transformedTable 
SELECT PrimaryOwners,
       Owners
FROM raw_lms.rawTable

Error in SQL statement: AnalysisException: cannot resolve
‘spark_catalog.raw_lms.rawTable.PrimaryOwners‘ due to data type
mismatch: cannot cast string to array<struct<Id:string>>;

I do not want to explode the data. I only need to simply insert row for a row between rawTable and transformedTable of different column data types.

Thanks for your time and help.

>Solution :

As the error messages states, you can’t insert a string as array. You need to use array and named_struct functions.

Change the type of raw table to correct type and types not strings and try this:

INSERT INTO TABLE raw_lms.rawTable
VALUES
(array(named_struct('id', '1393fe1b-bba2-4343-dff0-08d9dea59a03'), named_struct('id', 'cf2e6549-5d07-458c-9d30-08d9dd5885cf')), 
 null
);

Or if you want to keep columns as string in raw table, then use from_json to parse the strings into correct type before inserting:

INSERT INTO tl_lms.transformedTable 
SELECT from_json(PrimaryOwners, 'array<struct<Id:STRING>>'),
       from_json(Owners, 'array<struct<Id:STRING>>')
FROM raw_lms.rawTable