- How do we divide all numeric values by 10 in the entire pandas dataframe lying between 10 and 100?
conditions:
- Time or any non-numeric column to be ignored.
- The numbers can lie in any row or column.
time |
n1 | n2 | n3 | n4 |
|---|---|---|---|---|
| 11:50 | 1 | 2 | 3 | 40 |
| 12:50 | 5 | 6 | 70 |
8 |
| 13:50 | 80 |
7 | 6 | 500 |
Use this code if need be:
import pandas as pd
import numpy as np
time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
'n1': [1, 5, 80],
'n2': [2, 6 ,7],
'n3': [3, 70 ,6],
'n4': [40, 8, 500],
}
df1 = pd.DataFrame(data = data_1)
df1
Try 1: It doesn’t seem to work
j = 0
k = 0
for i in df:
if df[j][k] > 10 and df[j][k] < 100:
df[j][k] = df[j][k] / 10
j = j + 1
else:
pass;
k = k + 1
Expected Result:
- Since 80, 70, 40 are the numbers lying between 10 and 100, they are all replaced by x/10 in the same dataframe.
- 80 –> 80/10 = 8
- 70 –> 70/10 = 7
- 40 –> 40/10 = 4
- Entire column of time is ignored as it is non-numeric value.
>Solution :
Using DataFrame.applymap is pretty slow when working with a big data set, it doesn’t scale well. You should always look for a vectorized solution if possible.
In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask (or DataFrame.where if you negate the condition).
# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns
# In DataFrame.mask `df` is replaced by the calling DataFrame,
# in this case df = df1[num_cols]
df1[num_cols] = (
df1[num_cols].mask(lambda df: (df > 10) & (df < 100),
lambda df: df // 10)
)
Output:
>>> df1
time n1 n2 n3 n4
0 11:50 1 2 3 4
1 12:50 5 6 7 8
2 13:50 8 7 6 500
Setup:
time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
'n1': [1, 5, 80],
'n2': [2, 6 ,7],
'n3': [3, 70 ,6],
'n4': [40, 8, 500],
}
df1 = pd.DataFrame(data = data_1)