First of all, my English is not my native language, sorry for my mistakes.
I am looking to automate some tasks done via Excel with Python.
I have a dataframe ordered by date/time, and I want to check if a customer has contacted me again after already having a response.
So I have a dataframe like this:
| Date | Tel |
| ---------- | ------------ |
| 01-01-2023 | +33000000001 |
| 01-01-2023 | +33000000002 |
| 01-01-2023 | +33000000003 |
| 02-01-2023 | +33000000002 |
| 02-01-2023 | +33000000004 |
I’d like to add a column TRUE/FALSE if my client has contacted me later :
| Date | Tel | Re-contact |
| ------------ | ------------ | ------------ |
| 01-01-2023 | +33000000001 | FALSE |
| 01-01-2023 | +33000000002 | TRUE |
| 01-01-2023 | +33000000003 | FALSE |
| 02-01-2023 | +33000000002 | FALSE |
| 02-01-2023 | +33000000004 | FALSE |
In Excel, I do this action as follows:
COUNTIFS(A2:A$5;A1)>0
And I would get my TRUE/FALSE if the phone number exists further in my list.
I looked at the documentation to see if a value existed in a list, but I couldn’t find a way to see if it existed further down. Also, I’m looking for a quick way to calculate it, as I have 100,000 rows in my dataframe.
# I've tried this so far:
length = len(df.index) - 1
i = 1
for i in range(i, length):
print(i)
for x in df['number']:
if x in df['number'][[i+1, length]]:
df['Re-contact'] = 'TRUE'
else:
df['Re-contact'] = 'FALSE'
i += 1
It feels very wrong to me, and my code takes too much time. I’m looking for a more efficient way to perform what I’m trying to do.
>Solution :
Use pandas.DataFrame.duplicated
over Tel
column to find repeated calls:
df['Re-contact'] = df.Tel.duplicated(keep='last')
Date Tel Re-contact
0 01-01-2023 33000000001 False
1 01-01-2023 33000000002 True
2 01-01-2023 33000000003 False
3 02-01-2023 33000000002 False
4 02-01-2023 33000000004 False