Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

spark sql Insert string column to struct of array type column

I am trying to insert a STRING type column to an ARRAY of STRUCT TYPE column, but facing errors. Could you help to provide the right direction to do the INSERT.

In databricks notebook, I have a raw table (raw_lms.rawTable) where all the columns are string type. This needs to insert into a transform table (tl_lms.transformedTable) where the columns are array of struct type.

CREATE TABLE raw_lms.rawTable
 (  PrimaryOwners STRING
  ,Owners STRING
 )
 USING DELTA LOCATION 'xxxx/rawTable'
CREATE TABLE tl_lms.transformedTable
 (  PrimaryOwners array<struct<Id:STRING>>
  ,Owners array<struct<Id:STRING>>
 )
 USING DELTA LOCATION 'xxxx/transformedTable'

Raw table has the below values populated: Eg.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

INSERT INTO TABLE raw_lms.rawTable
VALUES
("[{'Id': '1393fe1b-bba2-4343-dff0-08d9dea59a03'}, {'Id': 'cf2e6549-5d07-458c-9d30-08d9dd5885cf'}]",
 "[]"
)

enter image description here

I try to insert to transform table and get the below error:

INSERT INTO tl_lms.transformedTable 
SELECT PrimaryOwners,
       Owners
FROM raw_lms.rawTable

Error in SQL statement: AnalysisException: cannot resolve
‘spark_catalog.raw_lms.rawTable.PrimaryOwners‘ due to data type
mismatch: cannot cast string to array<struct<Id:string>>;

I do not want to explode the data. I only need to simply insert row for a row between rawTable and transformedTable of different column data types.

Thanks for your time and help.

>Solution :

As the error messages states, you can’t insert a string as array. You need to use array and named_struct functions.

Change the type of raw table to correct type and types not strings and try this:

INSERT INTO TABLE raw_lms.rawTable
VALUES
(array(named_struct('id', '1393fe1b-bba2-4343-dff0-08d9dea59a03'), named_struct('id', 'cf2e6549-5d07-458c-9d30-08d9dd5885cf')), 
 null
);

Or if you want to keep columns as string in raw table, then use from_json to parse the strings into correct type before inserting:

INSERT INTO tl_lms.transformedTable 
SELECT from_json(PrimaryOwners, 'array<struct<Id:STRING>>'),
       from_json(Owners, 'array<struct<Id:STRING>>')
FROM raw_lms.rawTable
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading