Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find percent difference between two columns, that share same root name, but differ in suffix

My question is somewhat similar to subtracting-two-columns-named-in-certain-pattern

I’m having trouble performing operations on columns that share the same root substring, without a loop. Basically I want to calculate a percentage change using columns that end with ‘_PY’ with another column that shares the same name except for the suffix.

What’s a possible one line solution, or one that doesn’t involve a for loop?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

url = r'https://www2.arccorp.com/globalassets/forms/corpstats.csv?1653338666304'
df = pd.read_csv(url)
df = df[df['TYPE'] == 'M']

PY_cols = [col for col in df.columns if col.endswith("PY")]
reg_cols = [col.split("_PY")[0] for col in PY_cols]

for k,v in zip(reg_cols,PY_cols):
    df[f"{k}_YOY%"] = round((df[k] - df[v]) / df[v] * 100,2)
    
df
    

>Solution :

You can use:

v = (df[df.columns[df.columns.str.endswith('_PY')]]
       .rename(columns=lambda x: x.rsplit('_', maxsplit=1)[0]))
k = df[v.columns]

out = pd.concat([df, k.sub(v).div(v).mul(100).round(2).add_suffix('_YOY%')], axis=1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading