Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Adding Rows to a pandas dataframe, but running into index issues when using append

I have a data frame with about 20k rows and about 10 columns. One of the columns has a list in it between len=1 and len=4.

If a list has more than one entry in it, I want to duplicate that row and append it to the bottom. The second table is what I want to be added.

Index Col 1 Col 2 List Col List Len
1 ABC CDE [‘String1’, ‘String2’] 2
2 EFG HIJ [‘String3’] 1
Index Col 1 Col 2 List Col List Len
3 ABC CDE [‘String2’] 2

and change the first row to

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Index Col 1 Col 2 List Col List Len
1 ABC CDE [‘String1’] 2

Apologies for the really terrible formatting. When I created a second table to show everything, Stack Overflow viewed it as unformatted code and I couldn’t post it.

The index is the automatically generated index and not relevant to my data.

#Finding the max number of list len
max_list_len= max(original_df['List Col'])
#Set my counter variable
number=2
#Begin loop    
while number<=max_list_len:
    #copies relevant rows into new data frame
    dataframe_being_duplicated = original_df.loc[(original_df['List Col']).eq(number)] 
    dataframe_being_duplicated.loc[(original_df['List Len']).eq(number), 'List Col'] = dataframe_being_duplicated.loc[(original_df['List Len']).eq(number)]['List Col'].str[number-1]   
    full_quote_df = pd.concat(full_quote_df,dataframe_being_duplicated,ignore_index=True)
    number+=1 #increment number

With this, it throws an error that says "cannot reindex on an axis with duplicate labels"

I’m also now realizing that this process for rows where the len is larger than 2, because it only duplicates the row once (if it worked as intended).

Is there a faster way to ‘unstack’ rows like this?

>Solution :

If you really want to duplicate the rows:

out = df.loc[df.index.repeat(df['List Col'].str.len())]

Output:

Index   Col 1   Col 2   List Col    List Len
1   ABC CDE ['String1', 'String2']  2
1   ABC CDE ['String1', 'String2']  2
2   EFG HIJ ['String3'] 1

Else a classical explode:

df.explode('List Col')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading