import pandas as pd
data_list = [['Name', 'Fruit'],
['Abel', 'Apple'],
['Abel', 'Pear'],
['Abel', 'Coconut'],
['Abel', 'Pear'],
['Benny', 'Apple'],
['Benny', 'Apple'],
['Cain', 'Apple'],
['Cain', 'Coconut'],
['Cain', 'Pear'],
['Cain', 'Lemon'],
['Cain', 'Orange']]
record_df = pd.DataFrame(data_list[1:], columns = data_list[0])
I am trying to create another dataframe to tell me if someone has the same fruit.
Expected Output:
Name | Repeated_Fruits
Abel | 1
Benny| 1
Cain | 0
I have tried
bool_series = record_df.duplicated(subset=['Name'], keep=False)
record_df_2 = record_df[~bool_series]
But everything is True, am I missing another code?
>Solution :
You can do pd.crosstab
out = pd.crosstab(df.Name, df.Fruit).gt(1).sum(axis=1).to_frame('rep_name').reset_index()
Out[10]:
Name rep_name
0 Abel 1
1 Benny 1
2 Cain 0