Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Appending column values from one dataframe to another as a list

I have dozens of very similar dataFrames. What I want is to combine all ‘VALUE’ column values from each into lists, and return a dataFrame where the ‘VALUE’ column is comprised of these lists. I only want to do this for rows where ‘PV’ contains a substring from a list of substrings.

I came up with one way I thought would work, but it’s real nasty and doesn’t work anyways (stopped it at 3m). There has to be a better way of doing this, does anyone here have any ideas? Thanks for any and all help.

import pandas as np
# Example dataFrames
df0 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [1, 2, 3, 4]})
df1 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [5, 6, 7, 8]})
DATAFRAMES

df0 dataFrame          df1 dataFrame

PV   VALUE             PV   VALUE
pv1    1               pv1    5
pv2    2               pv2    6
pv3    3               pv3    7
pv4    4               pv4    8
# Nasty code I thought might work
strings = ['v2', 'v4']
for i, row0 in df0.iterrows():
    for j, row1 in df1.iterrows():
        if (row0['PV']==row1['PV']) & any(substring in row0['PV'] for substring in strings):
            df0.at[i,'VALUE'] = [row0['VALUE'], row1['VALUE']]

Desired result:

PV   VALUE
pv1    1
pv2  [2,6]
pv3    3
pv4  [4,8]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could filter df1 for rows that contain strings; concatenate it with df0; then groupby + agg(list) can aggregate "VALUE"s for each "PV".

Finally, you could use mask to take out the elements from the singleton lists.

out = (pd.concat([df0, df1[df1['PV'].str.contains('|'.join(strings))]])
       .groupby('PV', as_index=False)['VALUE'].agg(list))
out['VALUE'] = out['VALUE'].mask(out['VALUE'].str.len().eq(1), out['VALUE'].str[0])

Alternatively, we could make the values in the "VALUE" columns lists and merge + concatenate the lists:

df0['VALUE'] = df0['VALUE'].apply(lambda x: [x])
df1['VALUE'] = df1['VALUE'].where(df1['PV'].str.contains('|'.join(strings)), df1['VALUE'].max()+1).apply(lambda x: [x] if x <= df1['VALUE'].max() else [])
out = df0.merge(df1, on='PV').set_index('PV').sum(axis=1)
out = out.mask(out.str.len().eq(1), out.str[0]).reset_index(name='VALUE')

Output:

    PV   VALUE
0  pv1       1
1  pv2  [2, 6]
2  pv3       3
3  pv4  [4, 8]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading