Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Vectorizing a Function to Replicate Rows with Pandas

CONTEXT:

I have a DataFrame with a column and a function that duplicates a row based on the number in the column "count". My current method is very slow when working with larger datasets:

def replicate_row(df):
    for i in range(len(df)):
        row = df.iloc[i]
        if row['count']>0:
           rep = int(row['count'])-1
           if rep != 0:
               full_df = full_df.append([row]*rep, ignore_index=True)

I’m trying to figure out how to vectorize this function to run quicker and found this so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def vector_function(
    pandas_series: pd.Series) -> pd.Series:
    scaled_series = pandas_series['count'] - 1
    *** vectorized replication code here ? ***
    return scaled_series

SAMPLE DATA

Name    Age    Gender    Count
Jen     25     F         3
Paul    30     M         2

The expected outcome of DF would be:

Name    Age    Gender    
Jen     25     F         
Jen     25     F         
Jen     25     F         
Paul    30     M         
Paul    30     M         

>Solution :

Try using pd.Index.repeat:

df = f.loc[df.index.repeat(df['Count'])].reset_index(drop=True).drop('Count', axis=1)

Output:

>>> df
   Name  Age Gender
0   Jen   25      F
1   Jen   25      F
2   Jen   25      F
3  Paul   30      M
4  Paul   30      M
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading