Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine 2 string columns in pandas with different conditions in both columns with another condition within

I have the following data frame, there are other categories besides cat1 but I only want to make a change to the part of the dataframe where the category is cat1.

name   short code     category
thyrax thx   thxar.po cat1
gha    gh    gh.cd    cat1
play   pl    pl.v     cat1
xxdx   xd    xda.v    cat1
......

For cases where everything after the . in the code column is anything besides .cd, I want the short column to take what’s in the short column + what is after the . in the code column and become something like thx.po, but in cases where there is a cd, I want it to become .cn. I want the output to look like this,

name   short    code     category
thyrax thx.po   thxar.po cat1
gha    gh.cn    gh.cd    cat1
play   pl.v     pl.v     cat1
xxdx   xd.v     xda.v    cat1
......

I don’t know how to add a condition that IF the category is cat1 and the code after the . is .cd, make it into short + what’s after the . +cn.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want the same condition for everything unless what is after the . to be copied, but if what is after the . is a .cn, I want it to be a .cd. What is the best way to do it?

I have gotten this code so far,

df['short'] = (df['short'].add("."+df['code'].str.split(".").str[-1]).where(df['category'].eq("cat1"),df['short']))

But I can’t figure out how to add the condition where if in the code column, what appears after the . is .cd and to have something different happen.

So basically my conditions are this,

firstly, category must be cat1,

then take what is in the short column and merge it with what is after the . in the code column.

If what is after the . in the code column is cd, make it into cn.

>Solution :

Try this:

df['short'] = df['short'].astype(str) + np.where(df['category'].eq('cat1'), df['code'].astype(str).str.extract('(\..+)')[0].replace('.cd', '.cn'), '')

Output:

>>> df
     name   short      code category
0  thyrax  thx.po  thxar.po     cat1
1     gha      gh     gh.cd     cat2
2     gha   gh.cn     gh.cd     cat1
3    play    pl.v      pl.v     cat1
4    xxdx      xd     xda.v     cat2
5    xxdx      xd     xda.v     cat2
6    xxdx    xd.v     xda.v     cat1

(I added some dummy rows in the above data to demonstrate that it works with the right category, cat1.)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading