Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

SQL Delete specific rows based on date and criteria

I’ve got an issue I’m struggling to solve. I’ve got a table that has duplicate data that needs to be cleaned up. Consider the following example:

CREATE TABLE #StackOverFlow
(
    [ctrc_num] int, 
    [Ctrc_name] varchar(6),
    [docu] bit, 
    [adj] bit, 
    new bit, 
    [some_date] datetime
);
    
INSERT INTO #StackOverFlow
    ([ctrc_num], [Ctrc_name], [docu], [adj], [new], [some_date])
VALUES
    (12345, 'John R', null, null, 1, '2023-12-11 09:05:13.003'),
    (12345, 'John R', 1, null, 0, '2023-12-11 09:05:12.987'),
    (12345, 'John R', null, null, 1, '2023-12-11 09:05:12.947'),
    (56789, 'Sam S', null, null, 1, '2023-12-11 09:05:13.003'),
    (56789, 'Sam S', null, null, 1, '2023-12-11 09:05:12.987'),
    (56789, 'Sam S', 1, null, 0, '2023-12-11 09:05:12.947'),
    (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:13.003'),
    (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:12.987'),
    (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:12.947');

This gives me:

[ctrc_num]  [Ctrc_name] [docu]  [adj]   [new]   [some_date]
-----------------------------------------------------------------------
12345        John R     NULL    NULL    1       2023-12-11 09:05:13.003
12345        John R     1       NULL    0       2023-12-11 09:05:12.987
12345        John R     NULL    NULL    1       2023-12-11 09:05:12.947
56789        Sam S      NULL    NULL    1       2023-12-11 09:05:13.003
56789        Sam S      NULL    NULL    1       2023-12-11 09:05:12.987
56789        Sam S      1       NULL    0       2023-12-11 09:05:12.947
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:13.003
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:12.987
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:12.947

What I need to do is delete from the table duplicates. If new is 0, delete the records where new is 1. If all records have new = 1 keep the newest record and delete the older ones.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The result should look like this:

[ctrc_num]  [Ctrc_name] [docu]  [adj]  [new]    [some_date]
-----------------------------------------------------------------------
12345        John R     1       NULL    0       2023-12-11 09:05:12.987
56789        Sam S      1       NULL    0       2023-12-11 09:05:12.947
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:13.003

I’ve tried ROW_NUMBER:

;WITH RankedByDate AS
(
    SELECT 
        ctrc_num, Ctrc_name,
        docu, adj, new, some_date,
        ROW_NUMBER() OVER (PARTITION BY Ctrc_num, Ctrc_name, [docu],[adj], [new] 
                           ORDER BY some_date DESC) AS rNum
    FROM 
        #StackOverFlow
)
SELECT * 
FROM RankedByDate

This separates the ones with new = 0, but I still have the ones with new = 1 that are ordered.

Grouping gives me the records that are duplicated but no way to delete the ones needed to be deleted:

SELECT [ctrc_num]
    ,[Ctrc_name]
    ,[docu]
    ,[adj]
    ,[new]
FROM 
    #StackOverFlow
GROUP BY 
    [ctrc_num]
    ,[Ctrc_name]
    ,[docu]
    ,[adj]
    ,[new]
HAVING 
    COUNT(*) > 1

>Solution :

Break the problem down into it’s parts

  1. "If new is 0, delete the records where new is 1"

    delete from #StackOverFlow
    where [new] = 1 and [ctrc_num] in (select [ctrc_num] from #StackOverFlow where [new] = 0);

  2. "If all records have new = 1 keep the newest record and delete the older ones"
    Use a CTE to add a row number based on the date and partitioned by the [ctrc_num] such that the "first" record in each group is the one you want to keep – if there is only 1 row in a group that’s the one you want to keep anyway. Then delete everything else

;with cte as
(
    select 
         [ctrc_num]  
         ,ROW_NUMBER() OVER (PARTITION BY [ctrc_num] ORDER BY [ctrc_num], [some_date] DESC) as rw
    from #StackOverFlow
)
DELETE FROM cte where rw <> 1;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading