python – create new column based on last two letters of a string

I have a large dataframe containing many thousand URNs each with a country code on the end. I would like to create a new column that isolates those country codes:

my df:

    urn
0   CS16-1533232-GB 
1   CS16-1533233-GB     
2   CS16-1533234-GB 
3   CS16-1533235-BZ 
4   CS16-1533238-GB

Desired output:

    urn              country
0   CS16-1533232-GB   GB
1   CS16-1533233-GB   GB
2   CS16-1533234-GB   GB
3   CS16-1533235-BZ   BZ
4   CS16-1533238-GB   GB

>Solution :

If you always have 2 letters, simply slice:

df['country'] = df['urn'].str[-2:]

Else, extract the last letters with str.extract:

df['country'] = df['urn'].str.extract(r'([A-Z]+)$', expand=False)
# or non "-"
df['country'] = df['urn'].str.extract(r'([^-]+)$', expand=False)

Output:

               urn country
0  CS16-1533232-GB      GB
1  CS16-1533233-GB      GB
2  CS16-1533234-GB      GB
3  CS16-1533235-BZ      BZ
4  CS16-1533238-GB      GB

Leave a Reply