How to swap multiple values within a row based on a conditional in pandas

February 23, 2023

Description of my problem

I pulled some data from an api that is unfortunately formatted. In particular, there are four columns I need to utilize in some way to solve my issue. They are label_1, odds_1, label_2, and line_2. The odds_ columns can consist of either Over or Under. Ideally, I’d want one of the odds_ columns to consist of exclusively Over while the other consists of exclusively Under. But alas, that is not how the data was formatted, so I am tasked with doing this myself. Ideally, label_ would be all Over while label_2 would be all under.

Here is a visual of my problem:

My data

odds_1  label_1 line_1  odds_2  label_2 line_2                      
-165    Under   3.5      130     Over    3.5
-137    Under   2.5      108     Over    2.5
-104    Over    10.5    -122    Under   10.5
-117    Over    26.5    -109    Under   26.5
 100    Over    2.5     -125    Under    2.5
-117    Over    14.5    -109    Under   14.5

My desired output

odds_1  label_1 line_1  odds_2  label_2 line_2                      
 130    Over    3.5     -165    Under    3.5
 108    Over    2.5     -137    Under    2.5
-104    Over    10.5    -122    Under   10.5
-117    Over    26.5    -109    Under   26.5
 100    Over    2.5     -125    Under    2.5
-117    Over    14.5    -109    Under   14.5

What I have tried

I tried using pandas.DataFrame.at, but that poses multiple problems. 1. I don’t believe there is a way to vectorize it, and 2. It doesn’t address the necessary swap of the odds_ values.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel
I have also tried numpy.where. This fixes the vectorization issue, but does not address the swap of the odds_ columns (at least the way I’m currently doing it). I haven’t discovered a way to nest np.where to make this work, but maybe I’m missing something

df = np.where(df.label_1 == 'Under', 'Over', df.label_1)

odds_1  label_1 line_1  odds_2  label_2 line_2                      
-165    Over    3.5      130    Under    3.5
-137    Over    2.5      108    Under    2.5
-104    Over    10.5    -122    Under   10.5
-117    Over    26.5    -109    Under   26.5
 100    Over    2.5     -125    Under    2.5
-117    Over    14.5    -109    Under   14.5

I have also tried pandas.DataFrame.iterrows but can’t quite get that to work either. Still not sure how to address the swap of the odds_ columns here.

for idx, row in df.iterrows():
  row.label_1 = 'Over' where some conditional?

Hope I gave enough info. Any help would be greatly appreciated. Thanks!

>Solution :

You can swap values like a, b = b, a using boolean mask to select right rows (the 2 first):

m = df['label_1'] == 'Under'
cols1 = ['odds_1', 'label_1', 'line_1']
cols2 = ['odds_2', 'label_2', 'line_2']
df.loc[m, cols1], df.loc[m, cols2] = df.loc[m, cols2].values, df.loc[m, cols1].values

Output:

>>> df
   odds_1 label_1  line_1  odds_2 label_2  line_2
0     130    Over     3.5    -165   Under     3.5
1     108    Over     2.5    -137   Under     2.5
2    -104    Over    10.5    -122   Under    10.5
3    -117    Over    26.5    -109   Under    26.5
4     100    Over     2.5    -125   Under     2.5
5    -117    Over    14.5    -109   Under    14.5