Python variable inheritance/links

I observed something weird/new to me, which certainly comes handy in the moment, however I’d like to understand what’s happening in the background, to avoid unwanted modification of variables.

Take the code below (add_actual_wat_columns function), I create a new variable from a dictionary value (wat_days), modify it, and without actually putting it back in the original dictionary (df_dict), the original dictionary will also be updated.

Is this specific to pandas or a generic Python feature? If so, how can I avoid it when I need to?

Bonus question: is there a better way to typehint variable types so the syntax highlighting/autocomplete works properly in VSCode?

def main():
    file = "wapd_le.xlsx"
    raw = pd.read_excel(file, header=[0, 1])
    df_cols = list(raw.columns.unique(level=0))
    df_cols.pop(0)
    df_list = []

    for i in df_cols:
        df_list.append(raw["Vendor data"].join(raw[i]))
    df_dict = dict(zip(df_cols, df_list))

    print(df_dict.keys())
    sum_rows(df_dict)
    add_actual_wat_columns(df_dict)
    write_to_excel(df_dict)


def add_actual_wat_columns(df_dict: dict):
    """Creates proper WAT columns for each period.

    Base report from Cashcube sums FI doc payment terms, this function divides that value
    by the monthly FI doc count, adding new columns to the WAT Days dataframe.

    Args:
        df_dict (dict): contains dataframe descriptions as keys and dataframes as values.
    """
    # TODO refactor to have toggle for WAPD/WAT and add proper columns to either (both?) sheet.
    wat_days: pd.DataFrame
    wat_days = df_dict["WAT Days"]
    periods = list(wat_days.columns)[2:]
    actual_wat_periods = [str(x) + " actual WAT" for x in periods]
    wat_days[actual_wat_periods] = wat_days[periods].div(
        df_dict["Count (FI Document Number)"][periods]
    )
    wat_days["Sum actual WAT"] = wat_days[actual_wat_periods[0:-1]].mean(
        axis=1, numeric_only=True
    )
    wat_days.rename(columns={"Sum actual WAT": "Avg actual WAT"}, inplace=True)

>Solution :

Since your object is mutable, you are mutating it.*

wat_days = df_dict["WAT Days"] # wat_days is the object in df_dict
...
wat_days[actual_wat_periods] = ... # modify that object.

Another example:

things = {1: 2, 2: [3]}
x = things[1]
x += 1
y = things[2]
y[0] += 1
y += [1]
z = things[2][0]
z += 1
print(things)
# {1: 2, 2: [4, 1]}

do you see what is going on?
If you need an new copy of a mutable object, have a look at copy.

*Note that this is perhaps a backwards explanation—it’s just that x and z are modified by + by being replaced. The name-to-object binding works in the same way for x y and z.

Leave a Reply