Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create a new variable that tells us if the value in a given column is unique or not?

I’m working on the titanic dataset and figured that determining if a ticket is unique or not might be predictive. I’ve written some code below that tells us if the value is unique or not but I have a feeling that there is a much cleaner way of doing this.

 import pandas as pd
dummy_data = pd.DataFrame({"Ticket" : ['A','A','B','B','C','D']
                     })

dummy_data
counts = dummy_data['Ticket'].value_counts()
a = counts[counts>1] 
b=a.reset_index(name='Quantity')
b = b.rename(columns={'index': 'Tickets'})
b
dummy_data['Ticket_multiple'] = np.where(dummy_data['Ticket'].isin(b['Tickets']),1,0)
dummy_data.head(10)

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You can use pandas.Series.duplicated with keep=False:

Parameters: keep{‘first’, ‘last’, False}, default ‘first’ :
Method to handle dropping duplicates:

  • ‘first’ : Mark duplicates as True except for the first occurrence.
  • ‘last’ : Mark duplicates as True except for the last occurrence.
  • False : Mark all duplicates as True.
dummy_data['Ticket_multiple'] = dummy_data["Ticket"].duplicated(keep=False).astype(int)


Output :

print(dummy_data)

  Ticket  Ticket_multiple
0      A                1
1      A                1
2      B                1
3      B                1
4      C                0
5      D                0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading