I have dozens of very similar dataFrames. What I want is to combine all ‘VALUE’ column values from each into lists, and return a dataFrame where the ‘VALUE’ column is comprised of these lists. I only want to do this for rows where ‘PV’ contains a substring from a list of substrings.
I came up with one way I thought would work, but it’s real nasty and doesn’t work anyways (stopped it at 3m). There has to be a better way of doing this, does anyone here have any ideas? Thanks for any and all help.
import pandas as np
# Example dataFrames
df0 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [1, 2, 3, 4]})
df1 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [5, 6, 7, 8]})
DATAFRAMES
df0 dataFrame df1 dataFrame
PV VALUE PV VALUE
pv1 1 pv1 5
pv2 2 pv2 6
pv3 3 pv3 7
pv4 4 pv4 8
# Nasty code I thought might work
strings = ['v2', 'v4']
for i, row0 in df0.iterrows():
for j, row1 in df1.iterrows():
if (row0['PV']==row1['PV']) & any(substring in row0['PV'] for substring in strings):
df0.at[i,'VALUE'] = [row0['VALUE'], row1['VALUE']]
Desired result:
PV VALUE
pv1 1
pv2 [2,6]
pv3 3
pv4 [4,8]
>Solution :
You could filter df1 for rows that contain strings; concatenate it with df0; then groupby + agg(list) can aggregate "VALUE"s for each "PV".
Finally, you could use mask to take out the elements from the singleton lists.
out = (pd.concat([df0, df1[df1['PV'].str.contains('|'.join(strings))]])
.groupby('PV', as_index=False)['VALUE'].agg(list))
out['VALUE'] = out['VALUE'].mask(out['VALUE'].str.len().eq(1), out['VALUE'].str[0])
Alternatively, we could make the values in the "VALUE" columns lists and merge + concatenate the lists:
df0['VALUE'] = df0['VALUE'].apply(lambda x: [x])
df1['VALUE'] = df1['VALUE'].where(df1['PV'].str.contains('|'.join(strings)), df1['VALUE'].max()+1).apply(lambda x: [x] if x <= df1['VALUE'].max() else [])
out = df0.merge(df1, on='PV').set_index('PV').sum(axis=1)
out = out.mask(out.str.len().eq(1), out.str[0]).reset_index(name='VALUE')
Output:
PV VALUE
0 pv1 1
1 pv2 [2, 6]
2 pv3 3
3 pv4 [4, 8]