Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Trying to delete duplicate rows in MSSQL where the difference is the date or batch number

I have this query:

SELECT T1.ID_NUMBER,                                                                    

  T1.INCEPTION_DATE,

  T1.OCCURANCE,

  T1.TRANSACTION_DATE,

  T1.FILE_LOAD_DATE,

  T1.BATCH_NUM

FROM mastertable T1

INNER JOIN (SELECT ID_NUMBER,INCEPTION_DATE,OCCURANCE, COUNT(*) AS DUPL_COUNT

    FROM mastertable

    WHERE SOURCE_SYSTEM ='LEGACY'

    GROUP BY ID_NUMBER,INCEPTION_DATE,OCCURANCE

    HAVING COUNT(*) > 1

    ) t2 ON T2.ID_NUMBER = T1.ID_NUMBER AND T2.INCEPTION_DATE = T1.INCEPTION_DATE AND T2.OCCURANCE = T1.OCCURANCE 

ORDER BY 1,2,3,4,5

Which is returning the following results

ID_NUMBER INCEPTION_DATE OCCURANCE TRANSACTION_DATE FILE_LOAD_DATE BATCH_NUM
112897732 2008-09-15 4 2008-07-03 2008-07-07 17:57:19 06341
112897732 2008-09-15 4 2008-07-13 2008-07-18 03:35:55 06753
828194721 2008-11-11 1 2008-09-06 2008-09-17 02:50:44 97334
828194721 2008-11-11 1 2008-09-23 2008-09-24 02:55:27 98331
456457422 2008-09-28 1 2008-12-03 2008-07-13 08:08:39 00734
456457422 2008-09-28 1 2008-12-03 2008-07-18 13:35:55 00991
999272910 2008-05-07 3 2008-05-03 2008-10-13 08:08:38 11432
999272910 2008-05-07 3 2008-05-28 2008-10-18 03:35:55 13342
875328642 2008-03-01 3 2008-04-28 2008-01-23 08:08:38 74542
875328642 2008-03-01 3 2008-04-30 2008-01-25 12:55:11 77536
011028734 2008-07-12 2 2008-12-03 2008-08-07 11:57:03 23422
011028734 2008-07-12 2 2008-12-03 2008-08-11 17:23:29 25748
018264981 2008-07-09 0 2008-12-03 2008-12-07 02:18:12 00432
018264981 2008-07-09 0 2008-12-03 2008-12-11 17:44:19 00773

The earlier FILE_LOAD_DATE of each ID_NUMBER or lesser BATCH_NUM is the record I want to keep.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there a way to write a query that will delete the other records, perhaps using a CTE with ROW_NUMBER()?

I am hoping for something that is DRY in case this issue happens again. Thanks!

(Also if it isn’t too much trouble please explain how the solution works)

>Solution :

You could use a deletable CTE here:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY ID_NUMBER, INCEPTION_DATE, OCCURANCE
                                 ORDER BY FILE_LOAD_DATE, BATCH_NUM) rn
    FROM mastertable
    WHERE SOURCE_SYSTEM = 'LEGACY'
)

DELETE
FROM cte
WHERE rn > 1;

The logic here is to assign a row number to each group of records having the same values for ID_NUMBER, INCEPTION_DATE, and OCCURANCE. The first row number value of 1 will be assigned to the record having the earliest FILE_LOAD_DATE. In cases of two or more records tied for the earliest FILE_LOAD_DATE, the tie will be broken by the earliest BATCH_NUM.

The delete statement removes all records except for this earliest record.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading