Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to revert a numeric datatype back to date time? (python pandas, snowflake)

I was converting a pandas dataframe column from a string ("YYYY-MM-DDTHH:MM:SSZ") to date time. After the conversion I stored it in a snowflake database where it ended up being a numerical value that did not reflect the timestamp at all. On my first run of this program it worked as intended (correctly converting to a timestamp in snowflake) but on a later run, when a new table was concatenated, the result column ended up being in this numeric format.

"2023-07-20T14:43:42Z" -> 1,689,864,222,000,000

Just wondering if anyone’s seen this before, and maybe knows what is happening.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The process grabs the existing table in snowflake, and then converts all columns to string type (in order to concatenate with possible miss-matched data types) and then reuploads the new table after dropping duplicates. It was done this way as occasionally, the new table would contain a column which was missing from the existing table.

import pandas as pd
from snowflake.snowpark import Session

# session = create snowflake session
# newdf = new data into a new dataframe
 
existingdf = session.table(tablename).toPandas()

existingdf = existingdf.astype(str)
newdf = newdf.astype(str)

df = pd.concat([existingdf,newdf],ignore_index=True,sort=False)
df = df.loc[df.astype(str).drop_duplicates().index]

df["CREATIONTIME"] = pd.to_datetime(df["CREATIONTIME"], utc=True)

df = session.create_dataframe(df)
df.write.mode(action).save_as_table("TABLE_NAME")

In snowflake, the "CREATIONTIME" column is a fixed numeric
snowflake result column

Whereas the original csv is in this format
csv column

The above should be displaying the same results (just with a different datatype) rather than what actually occurred

"2023-07-20T14:43:42Z" -> 1,689,864,222,000,000

>Solution :

Snowpark currently has a bug related to Pandas datetime64[ns]. See SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date. The work around is to write a string to the Snowflake timestamp column:

for column in df_columns:
    if is_datetime64_dtype(df[column]):
        df[column] = df[column].dt.strftime('%Y-%m-%d %H:%M:%S.%f')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading