Follow

Follow

Contact

Home Finding duplicates in Dataframe and returning 1s and 0s

Questions

Finding duplicates in Dataframe and returning 1s and 0s

byMR

February 13, 2022

import pandas as pd
data_list = [['Name', 'Fruit'],
              ['Abel', 'Apple'],
              ['Abel', 'Pear'],
              ['Abel', 'Coconut'],
              ['Abel', 'Pear'],
              ['Benny', 'Apple'],
              ['Benny', 'Apple'],
              ['Cain', 'Apple'],
              ['Cain', 'Coconut'],
              ['Cain', 'Pear'],
              ['Cain', 'Lemon'],
              ['Cain', 'Orange']]

record_df = pd.DataFrame(data_list[1:], columns = data_list[0])

I am trying to create another dataframe to tell me if someone has the same fruit.

Expected Output:

Name | Repeated_Fruits
Abel | 1
Benny| 1
Cain | 0

I have tried

bool_series = record_df.duplicated(subset=['Name'], keep=False)
record_df_2 = record_df[~bool_series]

But everything is True, am I missing another code?

>Solution :

You can do pd.crosstab

out = pd.crosstab(df.Name, df.Fruit).gt(1).sum(axis=1).to_frame('rep_name').reset_index()
Out[10]: 
    Name  rep_name
0   Abel         1
1  Benny         1
2   Cain         0

dataframe

byMR

Published February 13, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

sed substitution of regex string in file

byMR

February 13, 2022

Questions

How to accept Querystring from URL into Angular route?

byMR

February 13, 2022

Questions

SQL query help: total people living in each province

byMR

February 13, 2022

Questions

Copy original array with empty values

byMR

February 13, 2022

Questions

Compiler error – is private within this context – Line 31

byMR

February 13, 2022

Questions

Find Specific Dates Within a Data Set Where data is less than?

byMR

February 13, 2022