Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Iterating over dataframe rows with for loop – problem with .loc

I have the two following dataframes (df1 and df2).

df1:

  code           name          region
0  AFG    Afghanistan     Middle East
1  NLD    Netherlands  Western Europe
2  AUT        Austria  Western Europe
3  IRQ           Iraq     Middle East
4  USA  United States   North America
5  CAD         Canada   North America

df2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   code  year  gdp per capita
0   AFG  2010          547.35
1   NLD  2010        44851.27
2   AUT  2010         3577.10
3   IRQ  2010         4052.06
4   USA  2010        52760.00
5   CAD  2010        41155.32
6   AFG  2015          578.47
7   NLD  2015        45175.23
8   AUT  2015         3952.80
9   IRQ  2015         4688.32
10  USA  2015        56863.37
11  CAD  2015        43635.10

Instead of merging the two dataframes, I would like to add the respective region from df1 as a new column to df2, using either iterrows() or a for loop.

When I call

for i in range(len(df2)):
  region = df1.loc[(df1["code"] == df2.loc[i, "code"]), "region"]
  df2.loc[i, "region"] = region

or

for index, row in df2.iterrows():
  region = df1.loc[df1["code"] == row["code"], "region"]
  df2.loc[index, "region"] = region

I get the error message "ValueError: Incompatible indexer with Series". I think it has to do with the last .loc call df2.loc[i, "region"] = region and df2.loc[index, "region"] = region, because I am using a number and a string at the same time. But I have also done that for calculating region in the for loop and there is no error message.

Your help would be appreciated.

>Solution :

Because region is a Series (not a scalar, for example), pandas thinks you want to assign it by index to df2.loc[i, "region"] and this causes the error.

In fact, you are looking to get at the singleton item in the series and assign that. You can achieve this by doing the following after initializing region: region = region.to_list()[0].

Here is a modified version of your code which should work:

for i in range(len(df2)):
    region = (df1.loc[(df1["code"] == df2.loc[i, "code"]), "region"]).to_list()[0]
    df2.loc[i, "region"] = region

Output:

   code  year  gdp per capita          region
0   AFG  2010            99.0     Middle East
1   NLD  2010            99.0  Western Europe
2   AUT  2010            99.0  Western Europe
3   IRQ  2010            99.0     Middle East
4   USA  2010            99.0   North America
5   CAD  2010            99.0   North America
6   AFG  2015            99.0     Middle East
7   NLD  2015            99.0  Western Europe
8   AUT  2015            99.0  Western Europe
9   IRQ  2015            99.0     Middle East
10  USA  2015            99.0   North America
11  CAD  2015            99.0   North America

Because you have said you do not want to "merge" the dataframes, I assume you are trying to avoid both merge() and join(), otherwise I would have suggested the following:

df2 = df2.join(df1.drop(columns='name').set_index('code'), on='code')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading