Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Making functional function in Pandas

I write my own function in Python. The function is very simple and below you can see data and function:

  data_1 = {'id':['1','2','3','4','5'],
            'name': ['Company1', 'Company1', 'Company3', 'Company4', 'Company5'], 
            'employee': [10, 3, 5, 1, 0], 
            'sales': [100, 30, 50, 200, 0], 
           }
    df = pd.DataFrame(data_1, columns = ['id','name', 'employee','sales'])
    
    threshold_1=40
    threshold_2=50

And the function is written below:

  def my_function(employee,sales):
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1 & employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return (sales_estimation)

df['new_column'] = df.apply(lambda x: my_function(x.employee,x.sales), axis=1)
df

So this function works well and gives the expected result.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Now I want to make the same function but with vectorized operation across Pandas Series. I need to have this function because vectorized operation decreases the time for executing. For this reason, I wrote this function but the function is not working.

  def my_function1(
        pandas_series:pd.Series
        )-> pd.Series:
        """
        Vectorized operation across Pandas Series
        """
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1 & employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return sales_estimation
    
    df['new_column_1']=my_function1(data['employee','sales'])

Probably my error is related to the input parameters of this function. So can anybody help me how to solve this problem and make my_function1 functional?

>Solution :

You need to slightly change one condition to be able to pass Series:

(sales >= threshold_1 & employee <= threshold_2)
# equivalent to
# sales >= (threshold_1 & employee) <= threshold_2

into:

(sales >= threshold_1) & (employee <= threshold_2)

as the operator precedence was incorrect.

def my_function(employee,sales):
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1) & (employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return (sales_estimation)

df['new_column'] = my_function(df['employee'], df['sales'])

output:

  id      name  employee  sales  new_column
0  1  Company1        10    100         400
1  2  Company1         3     30          60
2  3  Company3         5     50         200
3  4  Company4         1    200         800
4  5  Company5         0      0           0

You can also pass the whole dataframe ans subset the columns there:

def my_function(df):
    employee = df['employee']
    sales = df['sales']
    conditions = [
    (sales == 0 ),
    (sales < threshold_1), 
    (sales >= threshold_1) & (employee <= threshold_2)]
    values = [0, sales*2, sales*4]
    sales_estimation = np.select(conditions, values)    
    return (sales_estimation)

df['new_column'] = my_function(df)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading