Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

SUM all data in the same date and group data in certain range

Let’s say I have a dataframe

df = {
'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
'Value1': [9,12,7,8,15,16],
'Value2': [1,3,2,4,2,3]}

I want to sum Value2 in 2 groups, based on Value1(1-10,11-20) and date
and create a new dataframe
with 3 columns
Date,1-10,11-20.

I also want the range be extendable
like 1-10,11-20,21-30…..>100
so i am thinking of something like

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for i in range(0, 100, 10)

but no idea how to put the date condition

so that the final result is like:

new_df = {
'Date': ['1-8','2-8'],
'1-10': [3,4],
'11-12':[3,5]}

>Solution :

The following code should do what you expect:

import pandas as pd

# create a new dataframe from the original data
df = pd.DataFrame({
    'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
    'Value1': [9,12,7,8,15,16],
    'Value2': [1,3,2,4,2,3]
})

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], range(0, 110, 10))

# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()

# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')

print(new_df)

This code delivers code ranges up to (90, 100]. In order to only get relavant ranges you could modify the code like this:

import math
import pandas as pd

# create a new dataframe from the original data
df = pd.DataFrame({
    'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
    'Value1': [19,22,17,18,25,26],
    'Value2': [1,3,2,4,2,3]
})

# get the minimum and maximum values in Value1
lower_bound = df.Value1.min()
upper_bound = df.Value1.max()

# round down the lower bound to the nearest multiple of 10
lower_bound = math.floor(lower_bound/10)*10

# create a range of values from the lower bound to the upper bound, in increments of 10
value_range = range(lower_bound, upper_bound+10, 10)

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range)

# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()

# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')

print(new_df)

If you really want the ranges in the format 10-20, you could modify the code as follows:

# create labels for the value range in the format '10-20'
labels = [f"{lower}-{upper}" for lower, upper in zip(value_range[:-1], value_range[1:])]

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range, labels=labels)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading