SUM all data in the same date and group data in certain range

December 29, 2022

Let’s say I have a dataframe

df = {
'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
'Value1': [9,12,7,8,15,16],
'Value2': [1,3,2,4,2,3]}

I want to sum Value2 in 2 groups, based on Value1(1-10,11-20) and date
and create a new dataframe
with 3 columns
Date,1-10,11-20.

I also want the range be extendable
like 1-10,11-20,21-30…..>100
so i am thinking of something like

for i in range(0, 100, 10)

but no idea how to put the date condition

so that the final result is like:

new_df = {
'Date': ['1-8','2-8'],
'1-10': [3,4],
'11-12':[3,5]}

>Solution :

The following code should do what you expect:

import pandas as pd

# create a new dataframe from the original data
df = pd.DataFrame({
    'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
    'Value1': [9,12,7,8,15,16],
    'Value2': [1,3,2,4,2,3]
})

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], range(0, 110, 10))

# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()

# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')

print(new_df)

This code delivers code ranges up to (90, 100]. In order to only get relavant ranges you could modify the code like this:

import math
import pandas as pd

# create a new dataframe from the original data
df = pd.DataFrame({
    'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
    'Value1': [19,22,17,18,25,26],
    'Value2': [1,3,2,4,2,3]
})

# get the minimum and maximum values in Value1
lower_bound = df.Value1.min()
upper_bound = df.Value1.max()

# round down the lower bound to the nearest multiple of 10
lower_bound = math.floor(lower_bound/10)*10

# create a range of values from the lower bound to the upper bound, in increments of 10
value_range = range(lower_bound, upper_bound+10, 10)

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range)

# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()

# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')

print(new_df)

If you really want the ranges in the format 10-20, you could modify the code as follows:

# create labels for the value range in the format '10-20'
labels = [f"{lower}-{upper}" for lower, upper in zip(value_range[:-1], value_range[1:])]

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range, labels=labels)