Suppose I have a function that does something based on a DataFrame but doesn’t return a value. Something like, writing to a log file, or uploading to a database.
def some_func_no_return_value(df):
# Do something based on df
return # Doesn't return anything
I have some code that does this:
for key, df_group in df.groupby(some_column):
some_func_no_return_value(df_group)
Should the below code work the same? For me, it doesn’t do anything, and I’m wondering if that’s expected or if I’ve made a mistake somehow:
df.groupby(some_column).apply(lambda df_group : some_func_no_return_value(df_group))
Thanks for your help.
>Solution :
Well, you can, the same way that you can use a list comprehension for side effects:
[print(x) for x in iterable]
But you shouldn’t.
This is frowned upon as you will create an useless object as output (here an empty DataFrame).
The loop is the correct way to proceed:
for key, df_group in df.groupby(some_column):
some_func_no_return_value(df_group)
Example:
df = pd.DataFrame({'col1': list('ABABA'),
'col2': range(5)})
out = df.groupby('col1').apply(print)
print(type(out))
for _, g in df.groupby('col1'):
print(g)
Output:
col1 col2 # groupby side-effect
0 A 0
2 A 2
4 A 4
col1 col2
1 B 1
3 B 3
<class 'pandas.core.frame.DataFrame'> # useless output
col1 col2 # loop
0 A 0
2 A 2
4 A 4
col1 col2
1 B 1
3 B 3