Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to flag records using a CTE

I am using SQL Server, and I am trying to flag duplicate records using a CTE.

Duplicate records are selected based on having the same Call_GUID, Call_Type_ID, and Date.

I am using the ROW_NUMBER function to categorize the duplicate records. I want to flag all duplicate records except for the first record written based on the "DateTime" column.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The initial data looks like this:

DateTime Call_Type_ID Call_GUID Dup_Flag1 DupRank
2023-09-21 09:28:01.370 12986 00A4AB00000100000000520883048B0A NULL 1
2023-09-21 09:35:08.270 12986 00A4AB00000100000000520883048B0A NULL 2
2023-09-21 09:35:29.887 12986 00A4AB00000100000000520883048B0A NULL 3

The expected result should be this:

DateTime Call_Type_ID Call_GUID Dup_Flag1 DupRank
2023-09-21 09:28:01.370 12986 00A4AB00000100000000520883048B0A NULL 1
2023-09-21 09:35:08.270 12986 00A4AB00000100000000520883048B0A Y 2
2023-09-21 09:35:29.887 12986 00A4AB00000100000000520883048B0A Y 3

Instead, I am getting this:

DateTime Call_Type_ID Call_GUID Dup_Flag1 DupRank
2023-09-21 09:28:01.370 12986 00A4AB00000100000000520883048B0A Y 1
2023-09-21 09:35:08.270 12986 00A4AB00000100000000520883048B0A Y 2
2023-09-21 09:35:29.887 12986 00A4AB00000100000000520883048B0A Y 3

I am using the query below, but the "WHERE d.DupRank > 1" condition is not being applied.

I am also getting an error message when trying to put an alias (Dups) on the subquery.

WITH Dups AS
    (SELECT 
    "DateTime",
    Call_Type_ID,
    Call_GUID,
    Dup_Flag1,
    DupRank = ROW_NUMBER() OVER
        (PARTITION BY Call_GUID, Call_Type_ID, "Date" ORDER BY "DateTime" ASC)
    FROM #temp_records_SLS_U65_ALL
    WHERE Call_GUID = '00A4AB00000100000000520883048B0A') 

UPDATE #temp_records_SLS_U65_ALL
SET Dup_Flag1 = 'Y'
FROM #temp_records_SLS_U65_ALL AS t
INNER JOIN Dups AS d
ON t.Call_GUID = d.Call_GUID
WHERE d.DupRank > 1;

>Solution :

You need to include something to make your join more specific. You’re only joining your temp table to the dups table on Call_GUID. You need to be joining on at least all the fields you’re using in your partition by clause.

Even so, in your example, that’s not sufficient.

drop table if exists #table
create table #table
(
    Dt datetime,
    CallTypeId int,
    CallGuid varchar(50),
    DupFlag char(1)
)

insert into #Table
values
('2023-09-21 09:28:01.370', 12986, '00A4AB00000100000000520883048B0A',  NULL),
('2023-09-21 09:35:08.270', 12986, '00A4AB00000100000000520883048B0A',  NULL),
('2023-09-21 09:35:29.887', 12986, '00A4AB00000100000000520883048B0A',  NULL)

Consider, this is the query you’re essentially running to do your update:

;with dups as
(

    select 
        Dt,
        CallTypeId,
        CallGuid,
        DupFlag,
        DupRank = row_number() over (partition by CallGuid, CallTypeId order by DT)
    from #table
)
select *
from #Table t
inner join dups d
    on t.CallGuid = d.CallGuid
        --and t.CallTypeId = d.CallTypeID
        --and t.Dt = d.Dt
where d.DupRank > 1

Notice you get six rows, when you started off with three. What you’re doing here is applying a value of "Y" to _every row with a CallTypeId of 12986 and a CallGuid of 00A4AB00000100000000520883048B0A. That’s.. everything.

You need to add additional fields in there (again, at least the other partition column(s) and in your case, also the datetime field)

;with dups as
(

    select 
        Dt,
        CallTypeId,
        CallGuid,
        DupFlag,
        DupRank = row_number() over (partition by CallGuid, CallTypeId order by DT)
    from #table
)
select *
from #Table t
inner join dups d
    on t.CallGuid = d.CallGuid
        and t.CallTypeId = d.CallTypeID
        and t.Dt = d.Dt
where d.DupRank > 1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading