Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dataframe: print entire row/s where keys in the same row hold equal values

I would like to recovery the rows in a dataframe where, in the same row, differing keys hold equal values. I can display where, for instance, the rows where col2 == col3. I would like to get this code to track across col1 matching across col2, col3 and col4. Then col2 to match across col 3 and col4. Then finally col3 across col4.

I have read through this post and I am confused if iteration is the solution to my problem. If so, how can this be done.

I can display, for instance, the rows where col2 == col3.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# -*- coding: utf-8 -*-

import pandas as pd

##      writing a dataframe
rows = {'col1':['5412','5148','5800','2122','5645','1060','4801','1039'],
        'col2':['542','512','541','412','565','562','645','152'],
        'col3':['542','3120','3410','2112','5650','5620','4801','152'],
        'col4':['5800','2122','5645','2112','412','562','562','645']
}

df = pd.DataFrame(rows)
print(f'Unsorted dataframe \n\n{df}')

##  print the rows where col2 == col3
dft = df[(df['col2'] == df['col3'])]
print('\n\nupdate - list row of matching row elements')
print(dft)

##  print all except the rows where col2 == col3
dft = df.drop(df[(df['col2'] == df['col3'])].index)
print('\n\nupdate - Dropping rows of matching row elements')
print(dft)

With this I am getting back

   col1 col2 col3  col4
0  5412  542  542  5800
7  1039  152  152   645

I would like to get back

   col1  col2 col3   col4
0  5412  542  542   5800
3  2122  412  2112  2112
4  5645  565  5650  412
5  1060  562  5620  562
6  4801  645  4801  562
7  1039  152  152   645

>Solution :

Use nunique with axis=1 and compare it to the number of columns:

import pandas as pd

rows = {
    "col1": ["5412", "5148", "5800", "2122", "5645", "1060", "4801", "1039"],
    "col2": ["542", "512", "541", "412", "565", "562", "645", "152"],
    "col3": ["542", "3120", "3410", "2112", "5650", "5620", "4801", "152"],
    "col4": ["5800", "2122", "5645", "2112", "412", "562", "562", "645"],
}

df = pd.DataFrame(rows)

df = df[df.nunique(axis=1) < len(df.columns)]

print(df)

Output:

   col1 col2  col3  col4
0  5412  542   542  5800
3  2122  412  2112  2112
5  1060  562  5620   562
6  4801  645  4801   562
7  1039  152   152   645
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading