Home How can I efficiently optimize the creation of conditional columns based on multiple columns in Pandas DataFrames?

Questions

How can I efficiently optimize the creation of conditional columns based on multiple columns in Pandas DataFrames?

April 5, 2024

I have a dataframe with over 20 thousand rows and need to create a column based on more than 10 conditions from four other columns.

Instead of writing multiple lines of .loc, I decided to create a function. To enhance the performance and readability of this function, I opted to group the necessary columns into named_tuples using the pandas itertuples method and encapsulate it within the functools cache function, significantly improving the performance of the created function, albeit at the cost of initially grouping the columns into tuples. Here’s a simplified sample of what I did:

import pandas as pd
from functools import cache

transform_counts = 0
@cache
def transform(row):
  global transform_counts
  transform_counts += 1
  if row.A == 1 and row.B == 23:
    return 33
  else:
    return 40

df = pd.DataFrame({'A': [1, 2, 3, 4, 1, 2], 'B': [23, 43, 89, 50, 23, 43]})
df['C'] = list(df[['A', 'B']].itertuples(index=False))
df['C'] = df['C'].map(transform)

In this sample the transform_counts was 4 and the dataframe looked like this:


    A   B   C
0   1   23  33
1   2   43  40
2   3   89  40
3   4   50  40
4   1   23  33
5   2   43  40

Is there any way to further optimize the code, either by reducing the transform_counts or by skipping the need to group the necessary columns into tuples?

>Solution :

Assuming your transform function is too complicated to use the methods given in this question, you could just use apply with a wrapper function to your cached function. This would save the itertuples loop. For example:

import pandas as pd
from functools import cache

transform_counts = 0
@cache
def transform(a, b):
  global transform_counts
  transform_counts += 1
  if a == 1 and b == 23:
    return 33
  else:
    return 40

df = pd.DataFrame({'A': [1, 2, 3, 4, 1, 2], 'B': [23, 43, 89, 50, 23, 43]})
df['C'] = df.apply(lambda row:transform(row.A, row.B), axis=1)
print(df, transform_counts, sep='\n\n')

Output:

   A   B   C
0  1  23  33
1  2  43  40
2  3  89  40
3  4  50  40
4  1  23  33
5  2  43  40

4

pandas

byMR

Published April 05, 2024

Add a comment

Why is this for-loop stuck on the second iteration in C?

byMR

April 5, 2024

Questions

selectInput does not return ALL and convert factor to number in shiny

byMR

April 5, 2024

Questions

RangeError: Maximum call stack size exceeded – Typescript

byMR

April 5, 2024

Questions

I'm doing authentication using Google OAuth. How do I fetch the details of the user

byMR

April 5, 2024

Questions

Separating a column based on a string pattern in R using tidyverse functions

byMR

April 5, 2024

Questions

Python zlib not accepting 3 arguments?

byMR

April 5, 2024

How can I efficiently optimize the creation of conditional columns based on multiple columns in Pandas DataFrames?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Why is this for-loop stuck on the second iteration in C?

selectInput does not return ALL and convert factor to number in shiny

RangeError: Maximum call stack size exceeded – Typescript

I'm doing authentication using Google OAuth. How do I fetch the details of the user

Separating a column based on a string pattern in R using tidyverse functions

Python zlib not accepting 3 arguments?

Keep Up to Date with the Most Important News

How can I efficiently optimize the creation of conditional columns based on multiple columns in Pandas DataFrames?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Why is this for-loop stuck on the second iteration in C?

selectInput does not return ALL and convert factor to number in shiny

RangeError: Maximum call stack size exceeded – Typescript

I'm doing authentication using Google OAuth. How do I fetch the details of the user

Separating a column based on a string pattern in R using tidyverse functions

Python zlib not accepting 3 arguments?

Discover more from Dev solutions