How do I get variable length slices of values using Pandas?

July 19, 2024

I have data that includes a full name and first name, and I need to make a new column with the last name. I can assume full – first = last.

I’ve been trying to use slice with an index the length of the first name + 1. But that index is a series, not an integer. So it’s returning NaN.

The commented lines show the things I tried. It took me a while to realize what the series/integer issue was. It seems this shouldn’t be so difficult.

Thanks

import pandas as pd

columns = ['Full', 'First']
data = [('Joe Smith', 'Joe'), ('Bobby Sue Ford', 'Bobby Sue'), ('Current Resident', 'Current Resident'), ('', '')]
df = pd.DataFrame(data, columns=columns)

#first_chars = df['First'].str.len() + 1

#last = df['Full'].str[4:]
#last = df['Full'].str[first_chars:]
#last = df['Full'].str.slice(first_chars)
#last = df.Full.str[first_chars:]
#pd.DataFrame.insert(df, loc=2, column='Last', value=last)

#df['Last'] = df.Full.str[first_chars:]
#df['Last'] = str(df.Full.str[first_chars:])

#first_chars = int(first_chars)
#df['Last'] = df['Full'].apply(str).apply(lambda x: x[first_chars:])
df['Last'] = df['Full'].str.slice(df['First'].str.len() + 1)

print(df)

>Solution :

Use apply on axis=1 to replace each name:

df['Last'] = df.apply(lambda row: row['Full'].replace(row['First'], '').strip(), axis=1)

               Full             First   Last
0         Joe Smith               Joe  Smith
1    Bobby Sue Ford         Bobby Sue   Ford
2  Current Resident  Current Resident       
3