Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Normalization in pandas via function

I have this dataframe:

    age
0   48
1   7
2   62
3   48
4   51

This code:

import pandas as pd
import numpy as np

def normalizar(x):
  # Convert x to a numpy array to allow for vectorized operations.
  x = np.array(x)
  # Calculate the minimum and maximum values of x.
  xmin = x.min()
  xmax = x.max()
  # Normalize the array x using vectorized operations.
  return (x - xmin) / (xmax - xmin)

df["age_n"] = df["age"].apply(normalizar)
df

and I get:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   age  age_n
0   48    NaN
1    7    NaN
2   62    NaN
3   48    NaN
4   51    NaN

How can I solve this issue?

The expected result would be values between [0,1]

>Solution :

The problem appears with your .apply() function which actually not needed as your function handles it. You just need to feed the whole column to it:

df["age_n"] = normalizar(df["age"])

OP asked how it could be done with .apply() where you need to normalize all your coloumns (which of course is not really good approach), but in case sharing answer for it too.

def normalizar(x, xmin, xmax):
    return (x - xmin) / (xmax - xmin)

# and find min/max values 
xmin = df['age'].min()
xmax = df['age'].max()

# and apply normalization 
df["age_n"] = df["age"].apply(lambda x: normalizar(x, xmin, xmax))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading