Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why am I getting KeyErrors when accessing multiple rows in a dataframe?

I am trying to write the following condition to map MARKET_STATES.OPENING_AUCTION, however I am getting KeyError: (False, False, False) when I run the following:

market_info_df.loc[market_info_df['issue_status' == '10', 'market_phase' == '0',
                                          'trading_state' == '  '].iloc[0]] = MARKET_STATES.OPENING_AUCTION

Two questions:

  • What is it that the query does? My understanding is that using iloc[0] it finds the first instance where the three columns at the values stated. But then it maps something to MARKET_STATES.OPENING_AUCTION or?
  • How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say trade_type) to MARKET_STATES.OPENING_AUCTION?

Here is an example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

e.g:

row  issue_status  market_phase  trading_state                  trade_type
 0        20           0              '  '         ->        
 1        10           0              '  '         ->        OPENING_AUCTION
 2        20           0              '  '         ->       
 3        10           0              '  '         ->       
 4        10           0              '  '         ->       
 5        10           0              '  '         ->       

>Solution :

I’ll answer your second question first:

How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say trade_type) to MARKET_STATES.OPENING_AUCTION?

market_info_df = pd.DataFrame({'issue_status': {0: 20, 1: 10, 2: 20, 3: 10, 4: 10, 5: 10},
 'market_phase': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'trading_state': {0: '  ', 1: '  ', 2: '  ', 3: '  ', 4: '  ', 5: '  '}})

Any time you want to combine multiple conditions, remember that you need a logical and or logical or operation on those conditions.

When you have three conditions and you want to select rows where all of them are True, you need to and all the conditions. So in your case you’d need to do

mask = (market_info_df['issue_status'] == 10) & (market_info_df['market_phase'] == 0) & (market_info_df['trading_state'] == '  ')

To get a boolean mask that tells you which rows fulfill all those conditions. The & operator is overloaded by pandas to do the element-wise logical and for multiple Series objects. The parentheses are needed to override python’s operator precedence which puts & before ==

Note: My dataframe contains integers in the issue_status and market_phase columns, which is why I compare against integers e.g. market_info_df['issue_status'] == 10. If your columns contain strings (as your code would indicate), compare against strings (market_info_df['issue_status'] == "10")

Next, you want to set those rows for the "trade_type" column, so you’d do:

market_info_df.loc[mask, "trade_type"] = "OPENING_AUCTION"

Which sets those only rows:

   issue_status  market_phase trading_state       trade_type
0            20             0                            NaN
1            10             0                OPENING_AUCTION
2            20             0                            NaN
3            10             0                OPENING_AUCTION
4            10             0                OPENING_AUCTION
5            10             0                OPENING_AUCTION

Now, we can use idxmax() to get the index of the max value of the mask. Since this mask only contains True and False values, the first occurrence of the True value is the one that is returned. If we do this instead of setting all values of mask, we get:

market_info_df.loc[mask.idxmax(), "trade_type"] = "OPENING_AUCTION"

which gives:

   issue_status  market_phase trading_state       trade_type
0            20             0                            NaN
1            10             0                OPENING_AUCTION
2            20             0                            NaN
3            10             0                            NaN
4            10             0                            NaN
5            10             0                            NaN

As for your first question:

What is it that the query does? My understanding is that using iloc[0] it finds the first instance where the three columns at the values stated. But then it maps something to MARKET_STATES.OPENING_AUCTION or?

I have no idea what that code is supposed to do. The code inside the brackets i.e.:

'issue_status' == '10', 'market_phase' == '0', 'trading_state' == '  '

compares the string "issue_status" to the string "10" (which unsurprisingly is False) and so on, and returns a tuple that contains three Falses.

Then, market_info_df[False, False, False] tries to find the element in the dataframe at the False, False, False key, and throws the error when it fails to do so.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading