Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Apply function to several rows and columns of a pandas dataframe using pd.loc

I have a pandas dataframe that has incorrect values at certain rows and columns. An example dataframe would be something like this:

dict1 = {'Total Points': [10, 20, 30, 40, 50],
         'A pts': [8, 'X', 20, 30, 40],
         'A pts percentage': [0.80, 0.75, 0.67, 0.75, 0.80],
         'B pts': [2, 5, 'X', 'X', 10],
         'B pts percentage': [0.20, 0.25, 0.33, 0.25, 0.20]
        }

df = pd.DataFrame(dict1)
cols = ["A pts", "B pts"]
pct_cols = [2, 4]

bad_rows = df.index[df[cols].isin(['X']).any(axis=1)]

I can calculate which rows have the value I want to remove. What I want to do is calculate the actual pts scored by taking the next column (percentage) and multiplying it by the total points scored for that row. Ideally, if I could only perform that to the columns that have the bad value, that’d be ideal. I know I can use

df.loc[bad_rows, cols] = value

to set all values at those rows and columns to a particular value, but don’t know how to apply different values. I tried doing

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.loc[bad_rows, cols] = df.iloc[bad_rows, pct_cols] * df.loc[bad_rows, "Total Points"]

but that doesn’t work. Any input would be greatly appreciated.

Edit: I know I could simply apply a function to the entire dataframe for certain columns, but I’m trying to avoid doing that if possible, particularly because the dataset I’m working with is quite large and that would be incredibly inefficient. I know I could simply use for loops to iterate through everything and set each value one at a time, but that also seems inefficient.

>Solution :

Use mul on index axis:

df.loc[bad_rows, cols] = df.iloc[bad_rows, pct_cols].mul(df.loc[bad_rows, 'Total Points'], axis=0).values
print(df)

# Output
   Total Points A pts  A pts percentage B pts  B pts percentage
0            10     8              0.80     2              0.20
1            20  15.0              0.75   5.0              0.25
2            30  20.1              0.67   9.9              0.33
3            40  30.0              0.75  10.0              0.25
4            50    40              0.80    10              0.20
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading