Home How to find the change of text based on a unique value on another column in an excel file using Python

Questions

How to find the change of text based on a unique value on another column in an excel file using Python

November 30, 2022

I have a excel file containing three columns as shown below,

ID	Name	Date
117	Laspringe	2019-04-08
117	Laspringe (FT)	2020-06-16
117	Laspringe (Ftp)	2020-07-24
999	Angelo	2020-04-15
999	Angelo(FT)	2021-03-05
999	Angelo(Ftp)	2021-09-13
999	Angelo	2022-02-20

I wanted to find out that based on each ID which has the name changed from original name and changed back to the same original name. For example Angelo is changed to Angelo(FT), Angelo(Ftp) and changed back to original Angelo.

Whereas Laspringe is not changed back to the original name.

Is it possible to find out which of the ID’s have changed the name back to original using python ??

Expecting the result to be like,

ID
999

>Solution :

A simple way might be to check if the Name has any duplicate per group:

s = df.duplicated(['ID', 'Name']).groupby(df['ID']).any()
out = s[s].index.tolist()

Output: [999]

If you can have duplicates on successive dates (A -> A -> B shouldn’t be a match):

s = (df
  .sort_values(by='Date')
  .groupby('ID')['Name']
  .agg(lambda s: s[s.ne(s.shift())].duplicated().any())
)
out = s[s].index.tolist()

The two code will behave differently on this input:

    ID             Name        Date
0  117        Laspringe  2019-04-08
1  117        Laspringe  2019-04-09 # duplicated but no intermediate name
2  117   Laspringe (FT)  2020-06-16
3  117  Laspringe (Ftp)  2020-07-24
4  999           Angelo  2020-04-15
5  999       Angelo(FT)  2021-03-05
6  999      Angelo(Ftp)  2021-09-13
7  999           Angelo  2022-02-29