Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

SQL: Flattening multiple rows in ugly data

I’m working on a fairly odd scenario in which I’ve got to compensate for a client’s database that isn’t very well-ordered. The data (example below) concerns some products that have a main serial number (serial) and between 0 and 4 additional alternate serials (alt_serial). Unfortunately, neither of these columns is a primary key, and each alternate serial number produces an extra row duplicating the first.

test_table:
id | serial     | type           | alt_serial
----+------------+----------------+-----------------
0 | XL00007    | AA             | XL700001
1 | XL00007    | AB             | XL700002
2 | MARF665    | AC             | XTRA0001
3 | MARF665    | AD             | XTRA0002
4 | MARF665    | AE             | XTRA0003
5 | GLOMP12    | AF             | GLOMPX01
6 | GLOMP12    | AG             | GLOMPX02
7 | GLOMP12    | AH             | GLOMPX03
8 | SLONK15    | AI             | SLONKX01
9 | SLONK15    | AJ             | SLONKX02

The goal is to produce a single query (unions are ok) that essentially flattens this data into one row per primary serial, with the contents of the alternate serials condensed (e.g. using CONCAT) into a string in a single column. For example, given the table above, the ideal result of this query would be:

serial      | alt_serials
------------+-----------------------------------------------------
XL00007     | XL700001|XL700002
MARF665     | XTRA0001|XTRA0002|XTRA0003
GLOMP12     | GLOMPX01|GLOMPX02|GLOMPX03
SLONK15     | SLONKX01|SLONKX02

I realize that the ideal solution here is to fix the poorly formed table. Unfortunately, I do not have any kind of access or permissions to modify the data – I’m stuck with the existing table unchanged. For reasons specific to this implementation, I’m also not able to create additional databases or tables.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’m not a DBA, but I dabble in SQL. I’ve tried a couple of approaches, but the complexity of my queries very quickly explodes as I try to account for this data, and I imagine that there are better ways. Rather than post any of the unsuccessful attempts I’ve come up with, I wanted to ask Stack Overflow’s SQL Gurus: If you were stuck with this unfortunate data, how would you approach the problem?

>Solution :

  • you can use the string_agg function to concat all of them into 1
select
serial,
string_agg(alt_serial,',') as all_serials
from test_table
group by 1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading