Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Can't view Staged Parquet File in S3 from Snowflake

I’m working on moving some Parquet files in S3 over to Snowflake. The Storage Integration & External Stage were created just fine, and when I run the list @mystage command I can see the file that I want to check out in S3 so I know it exists & that my connection works.

When I go to do any kind of querying on the columns or table metadata, nothing is returned.

I’m following the basic documentation from this Snowflake Page.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

list @manual_test_stage;

enter image description here

create stage manual_test_stage
  url = 's3://cs-snowflake-poc/manual-loads/'
  storage_integration = cs_snowflake_poc_s3;

create or replace file format parquet_format
  type = 'parquet';

select *
  from table(
    infer_schema(
      location=>'@manual_test_stage/campspot_production.Campsite/test.parquet',
      file_format=>'parquet_format'
      )
    );

-- neither query works
select *
  from table(
    infer_schema(
      location=>'@manual_test_stage',
      file_format=>'parquet_format'
      )
    );

-- end goal is to use this to just create the table for me by automatically reading the schema, but i can't do that when select array_agg returns an empty array.
create table mytable
  using template (
    select array_agg(object_construct(*))
      from table(
        infer_schema(
          location=>'@manual_test_stage/campspot_production.Campsite/test.parquet',
          file_format=>'my_parquet_format'
        )
      ));

Tried all sorts of things: I recreated the stage to not include a / at the end, I tried like 5 different file format options in case that was screwing it up, and I loaded the parquet into Python to make sure it wasn’t some issue with the actual file.

If anyone can help show me what I’m missing to get this to work it’d be much appreciated !

>Solution :

INFER_SCHEMA allows to define FILES parameter.

select *
  from table(
    infer_schema(
      location=>'@manual_test_stage/campspot_production.Campsite',
      file_format=>'parquet_format',
      files=>'test.parquet'
      )
    );
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading