Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use dbt's incremental model without duplicates

I have a query that selects some data that I would like to use to create an incremental table. Something like:

{{
    config(
        materialized='incremental',
        unique_key='customer_id'
    )
}}
SELECT
    customer_id,
    email,
    updated_at,
    first_name,
    last_name
FROM data

The input data has duplicate customers in it. If I read the documentation correctly, then records with the same unique_key should be seen as the same record. They should be updated instead of creating duplicates in the final table. However, I am seeing duplicates in the final table instead. What am I doing wrong?

I am using Snowflake as a datawarehouse.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If your source table already contains the duplicate, this is the regular behavior.

As per dbt documentation: "The first time a model is run, the table is built by transforming all rows of source data."

Docs: https://docs.getdbt.com/docs/build/incremental-models

This means basically that the duplicates will be avoided in all future loads, but not during the initial creation. Hence you need to change your SELECT statement so that duplicates are somehow filtered out in the creation itself.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading