Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Finding the difference in rows of a table, and returning only the columns where values are different

I’m working on analyzing some data. I have a way of doing this in excel, but it’s slow and too much manual work. I’d like to find a more effective way to find what I’m looking for.

Here’s the scenario:
I have a DB table (multiple, but let’s just focus on a single one for now) that has many rows and many columns. Think of this as transactional data and we can call it Table0. It looks like the sample below.

Table0 has differences in columns 0,2,3,5 and has identical data in columns 1,4. I need to process this table, and only return the columns with differences: columns 0,2,3,5.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’m looking for a solution that will work with either Python or SQL (postgres) that can provide the sample output table below. It doesn’t seem like a complex issue, but I don’t have the luxury of time to get a custom solution running properly.

Are there any well-known methods of manipulating my data like this?

Table0
        C0   C1   C2   C3   C4   C5
    R0  aaa  ax   ay   aq   123  555
    R1  aab  ax   ay   aq   123  555
    R2  aac  ax   ay   aw   123  557
    R3  aad  ax   ax   aw   123  555
    R4  aae  ax   ay   aw   123  559
    R5  aaf  ax   ay   ae   123  555


Output
        C0   C2   C3   C5
    R0  aaa  ay   aq   555
    R1  aab  ay   aq   555
    R2  aac  ay   aw   557
    R3  aad  ax   aw   555
    R4  aae  ay   aw   559
    R5  aaf  ay   ae   555

>Solution :

Using pandas:

df.loc[:, df.nunique().ne(1)]

     C0  C2  C3   C5
R0  aaa  ay  aq  555
R1  aab  ay  aq  555
R2  aac  ay  aw  557
R3  aad  ax  aw  555
R4  aae  ay  aw  559
R5  aaf  ay  ae  555

The intermediate:

df.nunique()

C0    6
C1    1 # -> `False` with .ne(1)
C2    2
C3    3
C4    1 # -> `False` with .ne(1)
C5    3
dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading