Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group by Number, different size groups

I have a dataset where i show ticket purchases by number of days out from event (NDO).

NDO Purch Country
0 3 UK
1 3 UK
1 3 UK
1 3 USA
2 3 USA

I want to split the data in groups by NDO, eg 0 days, 1-5 days, 5-10 days, 50-100 days, and add an extra column, ‘NDOGroup’to let me then create a pivot table by NDOGroup, and ultimately a heatmap.

The code I used for this is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

bins = [-1,5,10,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96,101]
labels = [f'{a0}-{a1 - 1}' for a0, a1 in zip(bins[:-2], bins[1:-1])] + [f'≥{bins[-2]}']
df['NDOGroup'] = pd.cut(df['NDO'], bins=bins, labels=labels, right=True)

countrypt = df.pivot_table(index = 'NDOGroup',columns = 'Country', values ='Purch', aggfunc='sum', observed=False)

plt.figure(figsize = (15,9))
sns.heatmap(countrypt, cmap = 'viridis', vmin=0, linewidths=0.5, linecolor='black')

but the heatmap produced is like so:

heatmap

and when I examine the data I see that for NDO=0, the NDOGroup = NaN

NDO Purch Country NDOGroup
0 3 UK Nan
1 3 UK 0-4
1 3 UK 0-4
1 3 USA 0-4
2 3 USA 0-4

I have tried different indexing in the bins e.g. [-1,0,5], [-1,4,5] but can’t seem to get it to show the correct groups.

Is this an indexing issue? Or is there a much cleaner way possible to fix this so i get a table like below:

NDO Purch Country NDOGroup
0 3 UK 0
1 3 UK 1-5
3 3 UK 1-5
4 3 USA 1-5
5 3 USA 1-5
6 3 USA 6-10
12 3 USA 11-15

>Solution :

You could use:

df = pd.DataFrame({'NDO': [0, 1, 2, 5, 6, 10, 11, 15, 16, 91, 92, 96, 97, 1000]})

bins = [-np.inf,0,5,10,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96,np.inf]
labels = ['0']+[f'{a0+1}-{a1}' for a0, a1 in zip(bins[1:-2], bins[2:-1])] + [f'≥{bins[-2]}']

df['NDOGroup'] = pd.cut(df['NDO'], bins=bins, labels=labels, right=True)

Variant:

from itertools import pairwise
bins = [-np.inf,0,5,10,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96,np.inf]
labels = [f'{a0+1}-{a1}' for a0, a1 in pairwise(bins)]
labels[0] = '0'
labels[-1] = f'≥{bins[-2]}'

Output:

     NDO NDOGroup
0      0        0
1      1      1-5
2      2      1-5
3      5      1-5
4      6     6-10
5     10     6-10
6     11    11-16
7     15    11-16
8     16    11-16
9     91    87-91
10    92    92-96
11    96    92-96
12    97      ≥96
13  1000      ≥96
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading