How to flag records using a CTE

October 20, 2023

I am using SQL Server, and I am trying to flag duplicate records using a CTE.

Duplicate records are selected based on having the same Call_GUID, Call_Type_ID, and Date.

I am using the ROW_NUMBER function to categorize the duplicate records. I want to flag all duplicate records except for the first record written based on the "DateTime" column.

The initial data looks like this:

DateTime	Call_Type_ID	Call_GUID	Dup_Flag1	DupRank
2023-09-21 09:28:01.370	12986	00A4AB00000100000000520883048B0A	NULL	1
2023-09-21 09:35:08.270	12986	00A4AB00000100000000520883048B0A	NULL	2
2023-09-21 09:35:29.887	12986	00A4AB00000100000000520883048B0A	NULL	3

The expected result should be this:

DateTime	Call_Type_ID	Call_GUID	Dup_Flag1	DupRank
2023-09-21 09:28:01.370	12986	00A4AB00000100000000520883048B0A	NULL	1
2023-09-21 09:35:08.270	12986	00A4AB00000100000000520883048B0A	Y	2
2023-09-21 09:35:29.887	12986	00A4AB00000100000000520883048B0A	Y	3

Instead, I am getting this:

DateTime	Call_Type_ID	Call_GUID	Dup_Flag1	DupRank
2023-09-21 09:28:01.370	12986	00A4AB00000100000000520883048B0A	Y	1
2023-09-21 09:35:08.270	12986	00A4AB00000100000000520883048B0A	Y	2
2023-09-21 09:35:29.887	12986	00A4AB00000100000000520883048B0A	Y	3

I am using the query below, but the "WHERE d.DupRank > 1" condition is not being applied.

I am also getting an error message when trying to put an alias (Dups) on the subquery.

WITH Dups AS
    (SELECT 
    "DateTime",
    Call_Type_ID,
    Call_GUID,
    Dup_Flag1,
    DupRank = ROW_NUMBER() OVER
        (PARTITION BY Call_GUID, Call_Type_ID, "Date" ORDER BY "DateTime" ASC)
    FROM #temp_records_SLS_U65_ALL
    WHERE Call_GUID = '00A4AB00000100000000520883048B0A') 

UPDATE #temp_records_SLS_U65_ALL
SET Dup_Flag1 = 'Y'
FROM #temp_records_SLS_U65_ALL AS t
INNER JOIN Dups AS d
ON t.Call_GUID = d.Call_GUID
WHERE d.DupRank > 1;

>Solution :

You need to include something to make your join more specific. You’re only joining your temp table to the dups table on Call_GUID. You need to be joining on at least all the fields you’re using in your partition by clause.

Even so, in your example, that’s not sufficient.

drop table if exists #table
create table #table
(
    Dt datetime,
    CallTypeId int,
    CallGuid varchar(50),
    DupFlag char(1)
)

insert into #Table
values
('2023-09-21 09:28:01.370', 12986, '00A4AB00000100000000520883048B0A',  NULL),
('2023-09-21 09:35:08.270', 12986, '00A4AB00000100000000520883048B0A',  NULL),
('2023-09-21 09:35:29.887', 12986, '00A4AB00000100000000520883048B0A',  NULL)

Consider, this is the query you’re essentially running to do your update:

;with dups as
(

    select 
        Dt,
        CallTypeId,
        CallGuid,
        DupFlag,
        DupRank = row_number() over (partition by CallGuid, CallTypeId order by DT)
    from #table
)
select *
from #Table t
inner join dups d
    on t.CallGuid = d.CallGuid
        --and t.CallTypeId = d.CallTypeID
        --and t.Dt = d.Dt
where d.DupRank > 1

Notice you get six rows, when you started off with three. What you’re doing here is applying a value of "Y" to _every row with a CallTypeId of 12986 and a CallGuid of 00A4AB00000100000000520883048B0A. That’s.. everything.

You need to add additional fields in there (again, at least the other partition column(s) and in your case, also the datetime field)

;with dups as
(

    select 
        Dt,
        CallTypeId,
        CallGuid,
        DupFlag,
        DupRank = row_number() over (partition by CallGuid, CallTypeId order by DT)
    from #table
)
select *
from #Table t
inner join dups d
    on t.CallGuid = d.CallGuid
        and t.CallTypeId = d.CallTypeID
        and t.Dt = d.Dt
where d.DupRank > 1