I have the following data frame, there are other categories besides cat1 but I only want to make a change to the part of the dataframe where the category is cat1.
name short code category
thyrax thx thxar.po cat1
gha gh gh.cd cat1
play pl pl.v cat1
xxdx xd xda.v cat1
......
For cases where everything after the . in the code column is anything besides .cd, I want the short column to take what’s in the short column + what is after the . in the code column and become something like thx.po, but in cases where there is a cd, I want it to become .cn. I want the output to look like this,
name short code category
thyrax thx.po thxar.po cat1
gha gh.cn gh.cd cat1
play pl.v pl.v cat1
xxdx xd.v xda.v cat1
......
I don’t know how to add a condition that IF the category is cat1 and the code after the . is .cd, make it into short + what’s after the . +cn.
I want the same condition for everything unless what is after the . to be copied, but if what is after the . is a .cn, I want it to be a .cd. What is the best way to do it?
I have gotten this code so far,
df['short'] = (df['short'].add("."+df['code'].str.split(".").str[-1]).where(df['category'].eq("cat1"),df['short']))
But I can’t figure out how to add the condition where if in the code column, what appears after the . is .cd and to have something different happen.
So basically my conditions are this,
firstly, category must be cat1,
then take what is in the short column and merge it with what is after the . in the code column.
If what is after the . in the code column is cd, make it into cn.
>Solution :
Try this:
df['short'] = df['short'].astype(str) + np.where(df['category'].eq('cat1'), df['code'].astype(str).str.extract('(\..+)')[0].replace('.cd', '.cn'), '')
Output:
>>> df
name short code category
0 thyrax thx.po thxar.po cat1
1 gha gh gh.cd cat2
2 gha gh.cn gh.cd cat1
3 play pl.v pl.v cat1
4 xxdx xd xda.v cat2
5 xxdx xd xda.v cat2
6 xxdx xd.v xda.v cat1
(I added some dummy rows in the above data to demonstrate that it works with the right category, cat1.)