Group by Number, different size groups

September 19, 2024

I have a dataset where i show ticket purchases by number of days out from event (NDO).

NDO	Purch	Country
0	3	UK
1	3	UK
1	3	UK
1	3	USA
2	3	USA

I want to split the data in groups by NDO, eg 0 days, 1-5 days, 5-10 days, 50-100 days, and add an extra column, ‘NDOGroup’to let me then create a pivot table by NDOGroup, and ultimately a heatmap.

The code I used for this is:

bins = [-1,5,10,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96,101]
labels = [f'{a0}-{a1 - 1}' for a0, a1 in zip(bins[:-2], bins[1:-1])] + [f'≥{bins[-2]}']
df['NDOGroup'] = pd.cut(df['NDO'], bins=bins, labels=labels, right=True)

countrypt = df.pivot_table(index = 'NDOGroup',columns = 'Country', values ='Purch', aggfunc='sum', observed=False)

plt.figure(figsize = (15,9))
sns.heatmap(countrypt, cmap = 'viridis', vmin=0, linewidths=0.5, linecolor='black')

but the heatmap produced is like so:

and when I examine the data I see that for NDO=0, the NDOGroup = NaN

NDO	Purch	Country	NDOGroup
0	3	UK	Nan
1	3	UK	0-4
1	3	UK	0-4
1	3	USA	0-4
2	3	USA	0-4

I have tried different indexing in the bins e.g. [-1,0,5], [-1,4,5] but can’t seem to get it to show the correct groups.

Is this an indexing issue? Or is there a much cleaner way possible to fix this so i get a table like below:

NDO	Purch	Country	NDOGroup
0	3	UK	0
1	3	UK	1-5
3	3	UK	1-5
4	3	USA	1-5
5	3	USA	1-5
6	3	USA	6-10
12	3	USA	11-15

>Solution :

You could use:

df = pd.DataFrame({'NDO': [0, 1, 2, 5, 6, 10, 11, 15, 16, 91, 92, 96, 97, 1000]})

bins = [-np.inf,0,5,10,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96,np.inf]
labels = ['0']+[f'{a0+1}-{a1}' for a0, a1 in zip(bins[1:-2], bins[2:-1])] + [f'≥{bins[-2]}']

df['NDOGroup'] = pd.cut(df['NDO'], bins=bins, labels=labels, right=True)

Variant:

from itertools import pairwise
bins = [-np.inf,0,5,10,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96,np.inf]
labels = [f'{a0+1}-{a1}' for a0, a1 in pairwise(bins)]
labels[0] = '0'
labels[-1] = f'≥{bins[-2]}'

Output:

     NDO NDOGroup
0      0        0
1      1      1-5
2      2      1-5
3      5      1-5
4      6     6-10
5     10     6-10
6     11    11-16
7     15    11-16
8     16    11-16
9     91    87-91
10    92    92-96
11    96    92-96
12    97      ≥96
13  1000      ≥96