I’m trying to get every combination possible using a list of columns so that I can perform an aggregate function based on column i which is not in the columns list.
I was able to use the code below to get the aggregate for each column individually but am struggling to get the combinations when using more than one column. (i.e (‘a’,’b’,’i’), (‘a’,’c’,’i’), (‘b’,’c’,’i’) …)
I’ve tried using itertools.combination() but ran into memory issues and also had trouble turning it into a database.
I’ve also tried using pd.groupby() but that only gives me the combination using every column and not the combinations using a subset of the columns.
columns_loop = ['a','b','c','d','e','f','g','h']
list_of_df = []
for col in columns_loop:
df2 = df_raw.groupby([col, 'i']).agg(value= ('i', 'count'))
df = pd.DataFrame(df2)
df = df.reset_index()
list_of_df.append(df)
all_df = pd.concat(list_of_df)
>Solution :
You can try a combination of Iter tools for generation and Pandas for aggregation.
columns_loop = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
list_of_df = []
for r in range(1, len(columns_loop) + 1):
for combo in itertools.combinations(columns_loop, r):
df2 = df_raw.groupby(list(combo) + ['i']).agg(value=('i', 'count')).reset_index()
list_of_df.append(df2)
all_df = pd.concat(list_of_df, ignore_index=True)
import ace_tools as tools; tools.display_dataframe_to_user(name="Aggregated Combinations DataFrame", dataframe=all_df)
This should generate the combinations and perform the grouping and aggregation.