I have a large dataframe containing many thousand URNs each with a country code on the end. I would like to create a new column that isolates those country codes:
my df:
urn
0 CS16-1533232-GB
1 CS16-1533233-GB
2 CS16-1533234-GB
3 CS16-1533235-BZ
4 CS16-1533238-GB
Desired output:
urn country
0 CS16-1533232-GB GB
1 CS16-1533233-GB GB
2 CS16-1533234-GB GB
3 CS16-1533235-BZ BZ
4 CS16-1533238-GB GB
>Solution :
If you always have 2 letters, simply slice:
df['country'] = df['urn'].str[-2:]
Else, extract the last letters with str.extract
:
df['country'] = df['urn'].str.extract(r'([A-Z]+)$', expand=False)
# or non "-"
df['country'] = df['urn'].str.extract(r'([^-]+)$', expand=False)
Output:
urn country
0 CS16-1533232-GB GB
1 CS16-1533233-GB GB
2 CS16-1533234-GB GB
3 CS16-1533235-BZ BZ
4 CS16-1533238-GB GB