Home python pandas compare columns and add list of columns that differ

Questions

python pandas compare columns and add list of columns that differ

May 28, 2022

I would like to compare multiple columns in a data frame and add a new column that tells me which columns are different for each row.

for example, for this dataframe i want to compare a1 to a2 and b1 to b2:

   a1  b1  a2  b2
0   1   2   1   2
1   1   2   1   3
2   1   2   3   4

the output should be something like:

   a1  b1  a2  b2  diff
0   1   2   1   2  
1   1   2   1   3  'b1-b2'
2   1   2   3   4  'a1-a2,b1-b2'

this is what i have so far:

import numpy as np
import pandas as pd
data = [{'a1': 1, 'b1': 2, 'a2':1, 'b2':2},
        {'a1':1, 'b1': 2, 'a2': 1, 'b2':3},
        {'a1':1, 'b1': 2, 'a2':3 , 'b2':4}]
df = pd.DataFrame(data)

compare = [('a1','a2'),('b1','b2')]
comp_result = np.array([(df[x[0]] != df[x[1]]) for x in compare])

comp_result is a list of lists of True/False values for each of the comparisons but i am not sure how to use that to create the "diff" column.

>Solution :

Fast one-liner without loops:

col_groups = [c.columns for _, c in df.groupby(df.columns.str[0], axis=1)]

df['diff'] = pd.Series(np.sum([(df[l] != df[r]).map({True: f'{l}-{r}',False:''}) + ',' for l, r in col_groups], axis=0)).str.strip(',')

Output:

>>> df
   a1  b1  a2  b2         diff
0   1   2   1   2             
1   1   2   1   3        b1-b2
2   1   2   3   4  a1-a2,b1-b2

dataframe

byMR

Published May 28, 2022

Add a comment

Circular barplot from r-graph-gallery with error: “Aesthetics must be either length 1…”

byMR

May 28, 2022

Questions

Rust pointer iteration issue

byMR

May 28, 2022

Questions

This code didn't throw an exception or exit but runs infinitely why?

byMR

May 28, 2022

Questions

Match list elements based on attribute component

byMR

May 28, 2022

Questions

Comparing dynamically allocated strings in C

byMR

May 28, 2022

Questions

Compare Similar Columns in Data Frames, Replace Differences with NA

byMR

May 28, 2022

python pandas compare columns and add list of columns that differ

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Circular barplot from r-graph-gallery with error: “Aesthetics must be either length 1…”

Rust pointer iteration issue

This code didn't throw an exception or exit but runs infinitely why?

Match list elements based on attribute component

Comparing dynamically allocated strings in C

Compare Similar Columns in Data Frames, Replace Differences with NA

Keep Up to Date with the Most Important News

python pandas compare columns and add list of columns that differ

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Circular barplot from r-graph-gallery with error: “Aesthetics must be either length 1…”

Rust pointer iteration issue

This code didn't throw an exception or exit but runs infinitely why?

Match list elements based on attribute component

Comparing dynamically allocated strings in C

Compare Similar Columns in Data Frames, Replace Differences with NA

Discover more from Dev solutions