My question is somewhat similar to subtracting-two-columns-named-in-certain-pattern
I’m having trouble performing operations on columns that share the same root substring, without a loop. Basically I want to calculate a percentage change using columns that end with ‘_PY’ with another column that shares the same name except for the suffix.
What’s a possible one line solution, or one that doesn’t involve a for loop?
url = r'https://www2.arccorp.com/globalassets/forms/corpstats.csv?1653338666304'
df = pd.read_csv(url)
df = df[df['TYPE'] == 'M']
PY_cols = [col for col in df.columns if col.endswith("PY")]
reg_cols = [col.split("_PY")[0] for col in PY_cols]
for k,v in zip(reg_cols,PY_cols):
df[f"{k}_YOY%"] = round((df[k] - df[v]) / df[v] * 100,2)
df
>Solution :
You can use:
v = (df[df.columns[df.columns.str.endswith('_PY')]]
.rename(columns=lambda x: x.rsplit('_', maxsplit=1)[0]))
k = df[v.columns]
out = pd.concat([df, k.sub(v).div(v).mul(100).round(2).add_suffix('_YOY%')], axis=1)