Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

For loop on column in dataframe

I am trying to calculate an equation per row in a dataframe and assign the value to a new column :

def exercise_02():
    df_region = df.groupby(by = "region").sum()
    for i in range(len(df_region)):
        i == 0
        df_region["w_avg"] = df1["2018_x"][i] * df1["2018_y"][i] / df1["2018_y"][i]
        i = i+1
    result = df_region
    return result

when I only write this, it shows this output:
enter image description here

As you can see, the column w_avg has been created but it contains the same value.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried to solve by adding [i] after the column name inside the loop:

def exercise_02():
    df_region = df.groupby(by = "region").sum()
    for i in range(len(df_region)):
        i == 0
        **df_region["w_avg"][i]** = df1["2018_x"][i] * df1["2018_y"][i] / df1["2018_y"][i]
        i = i+1
    result = df_region
    return result

But instead, I get this error message:

 if tolerance is not None:

KeyError: 'w_avg'

Do you have any idea what I’m doing wrong?
Thank you!

>Solution :

The great thing about DataFrames is that you do not need to loop. They are vectorised. If you were to change your (homework?) function to

def exercise_02():
    df_region = df.groupby(by = "region").sum()
        df_region["w_avg"] = df_region["2018_x"] * df_region["2018_y"] / df_region["2018_y"]
    return df_region

However, I notice that you are mixing up multiple dataframes. If they do not have the same size, you will run into issues regarding length/index issues. That really depends on you relation between df_region, df, and df_1.

The reason I mention the latter is that you more or less accidentally(?) used variables outside the scope of your function. Your function has no parameters, but uses df and df1 inside its scope. So, there’s definitely something that is missing in your question you need to understand yourself, or convey to the community to completely ‘anwer’ your question.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading