Home Sum specific section of column within list of data frames iterative (Python)

Questions

Sum specific section of column within list of data frames iterative (Python)

October 28, 2022

I have list of dfs (in the code below named "Input") which can vary in its length (regarding the amount of dfs) and I want to sum specific sections of each column which refers to the year in the first column of each df. In the end I want my code to give me back the exact same list of dfs but with the summed values in each column (in the code below named "output"). So, I want to get from "Input" to "output".

import pandas as pd

# Input
l1 = [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023]
l2 = [1,2,3,4,5,6,7,8]
l3 = [2,3,4,5,6,7,8,9]
l4 = [1,2,3,4,5,6,7,8]
l5 = [2,3,4,5,6,7,8,9]
l6 = [1,2,3,4,5,6,7,8]


df00_in = pd.concat([pd.DataFrame(l1), pd.DataFrame(l2), 
                     pd.DataFrame(l3), pd.DataFrame(l4), pd.DataFrame(l5), pd.DataFrame(l6)], axis = 1)
df01_in = pd.concat([pd.DataFrame(l1), pd.DataFrame(l2), 
                     pd.DataFrame(l3), pd.DataFrame(l4), pd.DataFrame(l5), pd.DataFrame(l6)], axis = 1)
df02_in = pd.concat([pd.DataFrame(l1), pd.DataFrame(l2), 
                     pd.DataFrame(l3), pd.DataFrame(l4), pd.DataFrame(l5), pd.DataFrame(l6)], axis = 1)

# list of input dfs
Input = [df00_in, df01_in, df02_in]


###################################


# Output
l7 = [2022, 2023]
l8 = [sum(l2[:4]), sum(l2[4:])]
l9 = [sum(l3[:4]), sum(l3[4:])]
l10 = [sum(l4[:4]), sum(l4[4:])]
l11 = [sum(l5[:4]), sum(l5[4:])]
l12 = [sum(l6[:4]), sum(l6[4:])]


df00_out = pd.concat([pd.DataFrame(l7), pd.DataFrame(l8), 
                      pd.DataFrame(l9), pd.DataFrame(l10), pd.DataFrame(l11), pd.DataFrame(l12)], axis = 1)
df01_out = pd.concat([pd.DataFrame(l7), pd.DataFrame(l8), 
                      pd.DataFrame(l9), pd.DataFrame(l10), pd.DataFrame(l11), pd.DataFrame(l12)], axis = 1)
df02_out = pd.concat([pd.DataFrame(l7), pd.DataFrame(l8), 
                      pd.DataFrame(l9), pd.DataFrame(l10), pd.DataFrame(l11), pd.DataFrame(l12)], axis = 1)


# list of output dfs
output = [df00_out, df01_out, df02_out]

In the example above I added each year only four times (quarterly), but it can happen that every year will appear 12 times (so monthly and the dfs will expand in the same way). So, my idea is some kind of "SumIf: years are the same".
Any ideas on that? THX!

>Solution :

You have a very strange way of defining dataframe, you can simply give a list of list (or list of dict, or dict of list, or dict of dict, etc.). You should check the doc on that.

For your question, you can simply use groupby (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html)
With groupby you can group your dataframe by one or several columns (put together if they have the same value). And there is even a sum method you can directly apply on it.

Like that :

import pandas as pd

# Input
l1 = [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023]
l2 = [1,2,3,4,5,6,7,8]
l3 = [2,3,4,5,6,7,8,9]
l4 = [1,2,3,4,5,6,7,8]
l5 = [2,3,4,5,6,7,8,9]
l6 = [1,2,3,4,5,6,7,8]


df00_in = pd.DataFrame([l1,l2,l3,l4,l5,l6]).T
df01_in = pd.DataFrame([l1,l2,l3,l4,l5,l6]).T
df02_in = pd.DataFrame([l1,l2,l3,l4,l5,l6]).T

# list of input dfs
Input = [df00_in, df01_in, df02_in]

###################################
output1 = [df.groupby(by=[0]).sum().reset_index() for df in Input]

note that I added reset_index() to have the same exact output as you, but it’s definitely not necessary.