iterating over row and column and replace values based on condition

June 25, 2022

How do we divide all numeric values by 10 in the entire pandas dataframe lying between 10 and 100?

conditions:

Time or any non-numeric column to be ignored.
The numbers can lie in any row or column.

`time`	n1	n2	n3	n4
11:50	1	2	3	`40`
12:50	5	6	`70`	8
13:50	`80`	7	6	500

Use this code if need be:


import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
df1

Try 1: It doesn’t seem to work

j = 0
k = 0
for i in df:
    if df[j][k] > 10 and df[j][k] < 100:
        df[j][k] = df[j][k] / 10
        j = j + 1
    else:
        pass;
    k = k + 1

Expected Result:

Since 80, 70, 40 are the numbers lying between 10 and 100, they are all replaced by x/10 in the same dataframe.

80 –> 80/10 = 8
70 –> 70/10 = 7
40 –> 40/10 = 4

Entire column of time is ignored as it is non-numeric value.

>Solution :

Using DataFrame.applymap is pretty slow when working with a big data set, it doesn’t scale well. You should always look for a vectorized solution if possible.

In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask (or DataFrame.where if you negate the condition).

# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns

# In DataFrame.mask `df` is replaced by the calling DataFrame, 
# in this case df = df1[num_cols]
df1[num_cols] = (
    df1[num_cols].mask(lambda df: (df > 10) & (df < 100), 
                       lambda df: df // 10)
)

Output:

>>> df1

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

Setup:

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)