Home Find percent difference between two columns, that share same root name, but differ in suffix

Questions

Find percent difference between two columns, that share same root name, but differ in suffix

May 24, 2022

My question is somewhat similar to subtracting-two-columns-named-in-certain-pattern

I’m having trouble performing operations on columns that share the same root substring, without a loop. Basically I want to calculate a percentage change using columns that end with ‘_PY’ with another column that shares the same name except for the suffix.

What’s a possible one line solution, or one that doesn’t involve a for loop?

url = r'https://www2.arccorp.com/globalassets/forms/corpstats.csv?1653338666304'
df = pd.read_csv(url)
df = df[df['TYPE'] == 'M']

PY_cols = [col for col in df.columns if col.endswith("PY")]
reg_cols = [col.split("_PY")[0] for col in PY_cols]

for k,v in zip(reg_cols,PY_cols):
    df[f"{k}_YOY%"] = round((df[k] - df[v]) / df[v] * 100,2)
    
df

>Solution :

You can use:

v = (df[df.columns[df.columns.str.endswith('_PY')]]
       .rename(columns=lambda x: x.rsplit('_', maxsplit=1)[0]))
k = df[v.columns]

out = pd.concat([df, k.sub(v).div(v).mul(100).round(2).add_suffix('_YOY%')], axis=1)