Home Vectorizing a Function to Replicate Rows with Pandas

Questions

Vectorizing a Function to Replicate Rows with Pandas

March 23, 2022

CONTEXT:

I have a DataFrame with a column and a function that duplicates a row based on the number in the column "count". My current method is very slow when working with larger datasets:

def replicate_row(df):
    for i in range(len(df)):
        row = df.iloc[i]
        if row['count']>0:
           rep = int(row['count'])-1
           if rep != 0:
               full_df = full_df.append([row]*rep, ignore_index=True)

I’m trying to figure out how to vectorize this function to run quicker and found this so far:

def vector_function(
    pandas_series: pd.Series) -> pd.Series:
    scaled_series = pandas_series['count'] - 1
    *** vectorized replication code here ? ***
    return scaled_series

SAMPLE DATA

Name    Age    Gender    Count
Jen     25     F         3
Paul    30     M         2

The expected outcome of DF would be:

Name    Age    Gender    
Jen     25     F         
Jen     25     F         
Jen     25     F         
Paul    30     M         
Paul    30     M

>Solution :

Try using pd.Index.repeat:

df = f.loc[df.index.repeat(df['Count'])].reset_index(drop=True).drop('Count', axis=1)

Output:

>>> df
   Name  Age Gender
0   Jen   25      F
1   Jen   25      F
2   Jen   25      F
3  Paul   30      M
4  Paul   30      M

series

byMR

Published March 23, 2022

Add a comment

How to read a text file and sort it by value?

byMR

March 23, 2022

Questions

How do I convert numpy mgrid function as a function?

byMR

March 23, 2022

Questions

c# DataTable to dict multiple column (ex. python dataframe.to_dict("record"))

byMR

March 23, 2022

Questions

copyright is null, how do I make it so it runs after fully rendered?

byMR

March 23, 2022

Questions

How to combine object properties on the sibling properties

byMR

March 23, 2022

Questions

Python script doing multiple things at once

byMR

March 23, 2022

Vectorizing a Function to Replicate Rows with Pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to read a text file and sort it by value?

How do I convert numpy mgrid function as a function?

c# DataTable to dict multiple column (ex. python dataframe.to_dict("record"))

copyright is null, how do I make it so it runs after fully rendered?

How to combine object properties on the sibling properties

Python script doing multiple things at once

Keep Up to Date with the Most Important News

Vectorizing a Function to Replicate Rows with Pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to read a text file and sort it by value?

How do I convert numpy mgrid function as a function?

c# DataTable to dict multiple column (ex. python dataframe.to_dict("record"))

copyright is null, how do I make it so it runs after fully rendered?

How to combine object properties on the sibling properties

Python script doing multiple things at once

Discover more from Dev solutions