I am using SQL Server, and I am trying to flag duplicate records using a CTE.
Duplicate records are selected based on having the same Call_GUID, Call_Type_ID, and Date.
I am using the ROW_NUMBER function to categorize the duplicate records. I want to flag all duplicate records except for the first record written based on the "DateTime" column.
The initial data looks like this:
| DateTime | Call_Type_ID | Call_GUID | Dup_Flag1 | DupRank |
|---|---|---|---|---|
| 2023-09-21 09:28:01.370 | 12986 | 00A4AB00000100000000520883048B0A | NULL | 1 |
| 2023-09-21 09:35:08.270 | 12986 | 00A4AB00000100000000520883048B0A | NULL | 2 |
| 2023-09-21 09:35:29.887 | 12986 | 00A4AB00000100000000520883048B0A | NULL | 3 |
The expected result should be this:
| DateTime | Call_Type_ID | Call_GUID | Dup_Flag1 | DupRank |
|---|---|---|---|---|
| 2023-09-21 09:28:01.370 | 12986 | 00A4AB00000100000000520883048B0A | NULL | 1 |
| 2023-09-21 09:35:08.270 | 12986 | 00A4AB00000100000000520883048B0A | Y | 2 |
| 2023-09-21 09:35:29.887 | 12986 | 00A4AB00000100000000520883048B0A | Y | 3 |
Instead, I am getting this:
| DateTime | Call_Type_ID | Call_GUID | Dup_Flag1 | DupRank |
|---|---|---|---|---|
| 2023-09-21 09:28:01.370 | 12986 | 00A4AB00000100000000520883048B0A | Y | 1 |
| 2023-09-21 09:35:08.270 | 12986 | 00A4AB00000100000000520883048B0A | Y | 2 |
| 2023-09-21 09:35:29.887 | 12986 | 00A4AB00000100000000520883048B0A | Y | 3 |
I am using the query below, but the "WHERE d.DupRank > 1" condition is not being applied.
I am also getting an error message when trying to put an alias (Dups) on the subquery.
WITH Dups AS
(SELECT
"DateTime",
Call_Type_ID,
Call_GUID,
Dup_Flag1,
DupRank = ROW_NUMBER() OVER
(PARTITION BY Call_GUID, Call_Type_ID, "Date" ORDER BY "DateTime" ASC)
FROM #temp_records_SLS_U65_ALL
WHERE Call_GUID = '00A4AB00000100000000520883048B0A')
UPDATE #temp_records_SLS_U65_ALL
SET Dup_Flag1 = 'Y'
FROM #temp_records_SLS_U65_ALL AS t
INNER JOIN Dups AS d
ON t.Call_GUID = d.Call_GUID
WHERE d.DupRank > 1;
>Solution :
You need to include something to make your join more specific. You’re only joining your temp table to the dups table on Call_GUID. You need to be joining on at least all the fields you’re using in your partition by clause.
Even so, in your example, that’s not sufficient.
drop table if exists #table
create table #table
(
Dt datetime,
CallTypeId int,
CallGuid varchar(50),
DupFlag char(1)
)
insert into #Table
values
('2023-09-21 09:28:01.370', 12986, '00A4AB00000100000000520883048B0A', NULL),
('2023-09-21 09:35:08.270', 12986, '00A4AB00000100000000520883048B0A', NULL),
('2023-09-21 09:35:29.887', 12986, '00A4AB00000100000000520883048B0A', NULL)
Consider, this is the query you’re essentially running to do your update:
;with dups as
(
select
Dt,
CallTypeId,
CallGuid,
DupFlag,
DupRank = row_number() over (partition by CallGuid, CallTypeId order by DT)
from #table
)
select *
from #Table t
inner join dups d
on t.CallGuid = d.CallGuid
--and t.CallTypeId = d.CallTypeID
--and t.Dt = d.Dt
where d.DupRank > 1
Notice you get six rows, when you started off with three. What you’re doing here is applying a value of "Y" to _every row with a CallTypeId of 12986 and a CallGuid of 00A4AB00000100000000520883048B0A. That’s.. everything.
You need to add additional fields in there (again, at least the other partition column(s) and in your case, also the datetime field)
;with dups as
(
select
Dt,
CallTypeId,
CallGuid,
DupFlag,
DupRank = row_number() over (partition by CallGuid, CallTypeId order by DT)
from #table
)
select *
from #Table t
inner join dups d
on t.CallGuid = d.CallGuid
and t.CallTypeId = d.CallTypeID
and t.Dt = d.Dt
where d.DupRank > 1