Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

more efficient way to select duplicate users

Im trying to select * from all duplicate rows in users, where a duplicate is defined as two users sharing the same first_name and last_name. (I need to process the other columns that might differ)

Im using MySQL 8.0.28.

My first try was to literally translate my requirement:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

select * from `users` AS u1 where exists (select 1 from `users` AS u2 WHERE `u2`.`first_name` = `u1`.`first_name` AND `u2`.`last_name` = `u1`.`last_name` AND `u2`.`id` != `u1`.`id`)

Which, obviously, has a horrendous execution time.

My current query is

SELECT * from users where  Concat(first_name," ",last_name) IN (select Concat(first_name," ",last_name) from `users` GROUP BY first_name, last_name HAVING COUNT(*)>1)

which is vastly more efficient, but still takes more than 100ms for 8000 records. I suppose a solution that doesn’t use concat could benefit from indicies and would not need to calculate the result for each row.

Also, I couldn’t get group by to work because I need so select all columns of all rows that are duplicates, not just the distinct first_name‘s and last_name‘s. Also because I don’t want to disable ONLY_FULL_GROUP_BY (not sure if disabling that would help anyway).

Is there a more efficient, proper way to select these duplicate rows?

>Solution :

I would just use an aggregation approach here:

SELECT *
FROM users
WHERE (first_name, last_name) IN (
    SELECT first_name, last_name
    FROM users
    GROUP BY 1, 2
    HAVING COUNT(*) > 1
);

On MySQL 8+, we can also use COUNT() as an analytic function here:

WITH cte AS (
    SELECT *, COUNT(*) OVER (PARTITION BY first_name, last_name) AS cnt
    FROM users
)

SELECT *
FROM cte
WHERE cnt > 1;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading