Home Finding the difference in rows of a table, and returning only the columns where values are different

Questions

Finding the difference in rows of a table, and returning only the columns where values are different

October 10, 2024

I’m working on analyzing some data. I have a way of doing this in excel, but it’s slow and too much manual work. I’d like to find a more effective way to find what I’m looking for.

Here’s the scenario:
I have a DB table (multiple, but let’s just focus on a single one for now) that has many rows and many columns. Think of this as transactional data and we can call it Table0. It looks like the sample below.

Table0 has differences in columns 0,2,3,5 and has identical data in columns 1,4. I need to process this table, and only return the columns with differences: columns 0,2,3,5.

I’m looking for a solution that will work with either Python or SQL (postgres) that can provide the sample output table below. It doesn’t seem like a complex issue, but I don’t have the luxury of time to get a custom solution running properly.

Are there any well-known methods of manipulating my data like this?

Table0
        C0   C1   C2   C3   C4   C5
    R0  aaa  ax   ay   aq   123  555
    R1  aab  ax   ay   aq   123  555
    R2  aac  ax   ay   aw   123  557
    R3  aad  ax   ax   aw   123  555
    R4  aae  ax   ay   aw   123  559
    R5  aaf  ax   ay   ae   123  555


Output
        C0   C2   C3   C5
    R0  aaa  ay   aq   555
    R1  aab  ay   aq   555
    R2  aac  ay   aw   557
    R3  aad  ax   aw   555
    R4  aae  ay   aw   559
    R5  aaf  ay   ae   555

>Solution :

Using pandas:

Check where df.nunique is not equal to 1 using Series.ne and select with df.loc:

df.loc[:, df.nunique().ne(1)]

     C0  C2  C3   C5
R0  aaa  ay  aq  555
R1  aab  ay  aq  555
R2  aac  ay  aw  557
R3  aad  ax  aw  555
R4  aae  ay  aw  559
R5  aaf  ay  ae  555

The intermediate:

df.nunique()

C0    6
C1    1 # -> `False` with .ne(1)
C2    2
C3    3
C4    1 # -> `False` with .ne(1)
C5    3
dtype: int64

postgresql

byMR

Published October 10, 2024

Add a comment

Fixed spacing/padding for labels

byMR

October 10, 2024

Questions

Excel, sum of X number of cells under current cell

byMR

October 11, 2024

Questions

What do the three numbers in Perfetto duration mean?

byMR

October 11, 2024

Questions

Specialize function by argument values known at compile time in C++

byMR

October 11, 2024

Questions

Integrated CPython in my Qt program, the result of multi-threaded execution is abnormal

byMR

October 11, 2024

Questions

.zshrc file issue on kali linux WSL 2

byMR

October 11, 2024

Finding the difference in rows of a table, and returning only the columns where values are different

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Fixed spacing/padding for labels

Excel, sum of X number of cells under current cell

What do the three numbers in Perfetto duration mean?

Specialize function by argument values known at compile time in C++

Integrated CPython in my Qt program, the result of multi-threaded execution is abnormal

.zshrc file issue on kali linux WSL 2

Keep Up to Date with the Most Important News

Finding the difference in rows of a table, and returning only the columns where values are different

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Fixed spacing/padding for labels

Excel, sum of X number of cells under current cell

What do the three numbers in Perfetto duration mean?

Specialize function by argument values known at compile time in C++

Integrated CPython in my Qt program, the result of multi-threaded execution is abnormal

.zshrc file issue on kali linux WSL 2

Discover more from Dev solutions