Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum specific section of column within list of data frames iterative (Python)

I have list of dfs (in the code below named "Input") which can vary in its length (regarding the amount of dfs) and I want to sum specific sections of each column which refers to the year in the first column of each df. In the end I want my code to give me back the exact same list of dfs but with the summed values in each column (in the code below named "output"). So, I want to get from "Input" to "output".

import pandas as pd

# Input
l1 = [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023]
l2 = [1,2,3,4,5,6,7,8]
l3 = [2,3,4,5,6,7,8,9]
l4 = [1,2,3,4,5,6,7,8]
l5 = [2,3,4,5,6,7,8,9]
l6 = [1,2,3,4,5,6,7,8]


df00_in = pd.concat([pd.DataFrame(l1), pd.DataFrame(l2), 
                     pd.DataFrame(l3), pd.DataFrame(l4), pd.DataFrame(l5), pd.DataFrame(l6)], axis = 1)
df01_in = pd.concat([pd.DataFrame(l1), pd.DataFrame(l2), 
                     pd.DataFrame(l3), pd.DataFrame(l4), pd.DataFrame(l5), pd.DataFrame(l6)], axis = 1)
df02_in = pd.concat([pd.DataFrame(l1), pd.DataFrame(l2), 
                     pd.DataFrame(l3), pd.DataFrame(l4), pd.DataFrame(l5), pd.DataFrame(l6)], axis = 1)

# list of input dfs
Input = [df00_in, df01_in, df02_in]


###################################


# Output
l7 = [2022, 2023]
l8 = [sum(l2[:4]), sum(l2[4:])]
l9 = [sum(l3[:4]), sum(l3[4:])]
l10 = [sum(l4[:4]), sum(l4[4:])]
l11 = [sum(l5[:4]), sum(l5[4:])]
l12 = [sum(l6[:4]), sum(l6[4:])]


df00_out = pd.concat([pd.DataFrame(l7), pd.DataFrame(l8), 
                      pd.DataFrame(l9), pd.DataFrame(l10), pd.DataFrame(l11), pd.DataFrame(l12)], axis = 1)
df01_out = pd.concat([pd.DataFrame(l7), pd.DataFrame(l8), 
                      pd.DataFrame(l9), pd.DataFrame(l10), pd.DataFrame(l11), pd.DataFrame(l12)], axis = 1)
df02_out = pd.concat([pd.DataFrame(l7), pd.DataFrame(l8), 
                      pd.DataFrame(l9), pd.DataFrame(l10), pd.DataFrame(l11), pd.DataFrame(l12)], axis = 1)


# list of output dfs
output = [df00_out, df01_out, df02_out]

In the example above I added each year only four times (quarterly), but it can happen that every year will appear 12 times (so monthly and the dfs will expand in the same way). So, my idea is some kind of "SumIf: years are the same".
Any ideas on that? THX!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You have a very strange way of defining dataframe, you can simply give a list of list (or list of dict, or dict of list, or dict of dict, etc.). You should check the doc on that.

For your question, you can simply use groupby (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html)
With groupby you can group your dataframe by one or several columns (put together if they have the same value). And there is even a sum method you can directly apply on it.

Like that :

import pandas as pd

# Input
l1 = [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023]
l2 = [1,2,3,4,5,6,7,8]
l3 = [2,3,4,5,6,7,8,9]
l4 = [1,2,3,4,5,6,7,8]
l5 = [2,3,4,5,6,7,8,9]
l6 = [1,2,3,4,5,6,7,8]


df00_in = pd.DataFrame([l1,l2,l3,l4,l5,l6]).T
df01_in = pd.DataFrame([l1,l2,l3,l4,l5,l6]).T
df02_in = pd.DataFrame([l1,l2,l3,l4,l5,l6]).T

# list of input dfs
Input = [df00_in, df01_in, df02_in]

###################################
output1 = [df.groupby(by=[0]).sum().reset_index() for df in Input]

note that I added reset_index() to have the same exact output as you, but it’s definitely not necessary.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading