Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

iterating over row and column and replace values based on condition

  1. How do we divide all numeric values by 10 in the entire pandas dataframe lying between 10 and 100?

conditions:

  1. Time or any non-numeric column to be ignored.
  2. The numbers can lie in any row or column.
time n1 n2 n3 n4
11:50 1 2 3 40
12:50 5 6 70 8
13:50 80 7 6 500

Use this code if need be:


import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
df1

Try 1: It doesn’t seem to work

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

j = 0
k = 0
for i in df:
    if df[j][k] > 10 and df[j][k] < 100:
        df[j][k] = df[j][k] / 10
        j = j + 1
    else:
        pass;
    k = k + 1

Expected Result:

  1. Since 80, 70, 40 are the numbers lying between 10 and 100, they are all replaced by x/10 in the same dataframe.
  • 80 –> 80/10 = 8
  • 70 –> 70/10 = 7
  • 40 –> 40/10 = 4
  1. Entire column of time is ignored as it is non-numeric value.

>Solution :

Using DataFrame.applymap is pretty slow when working with a big data set, it doesn’t scale well. You should always look for a vectorized solution if possible.

In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask (or DataFrame.where if you negate the condition).

# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns

# In DataFrame.mask `df` is replaced by the calling DataFrame, 
# in this case df = df1[num_cols]
df1[num_cols] = (
    df1[num_cols].mask(lambda df: (df > 10) & (df < 100), 
                       lambda df: df // 10)
)

Output:

>>> df1

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

Setup:

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading