Get joint-frequency of two binned variables from pandas dataframe?

Advertisements

Let’s say I have a dataframe with two timeseries variables as follows:

Temperature Humidity
17.6 88
22.1 81
13.6 88
26.4 71
25.6 72

Here is the pd.DataFrame for it:

data = {
    'Temperature': [17.6, 22.1, 13.6, 26.4, 25.6],
    'Humidity': [88, 81, 88, 71, 72]
}

df = pd.DataFrame(data)

I would like to obtain a table, which has these two variables, binned as the axes, and the values being the frequency (nominal or relative) of the combined condition.

For example, the end output would be formatted with binned temperatures as columns and binned humidity as the index:

0-4 4-8 8-12 12-16 16-20 20-24 24-28 28-32
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100

I know how to bin and get the counts for one binning, but I’m really not sure how to count the combined occurrences of two binned conditions. Can anyone please point me in the right direction?

df['T_binned'] = pd.cut(df['Temperature'],bins=np.arange(0,32,4))
df['H_binned'] = pd.cut(df['Humidity'],bins=np.arange(0,100,10))
counts_T = df.groupby(by='T_binned').count()
counts_H = df.groupby(by='H_binned').count()

>Solution :

Use a crosstab:

pd.crosstab(pd.cut(df['Temperature'],bins=np.arange(0,32,4)),
            pd.cut(df['Humidity'],bins=np.arange(0,100,10)))

Or for all categories:

pd.crosstab(pd.cut(df['Temperature'],bins=np.arange(0,32,4)),
            pd.cut(df['Humidity'],bins=np.arange(0,100,10)),
            dropna=False
           )

Output:

Humidity     (0, 10]  (10, 20]  (20, 30]  (30, 40]  (40, 50]  (50, 60]  (60, 70]  (70, 80]  (80, 90]
Temperature                                                                                         
(0, 4]             0         0         0         0         0         0         0         0         0
(4, 8]             0         0         0         0         0         0         0         0         0
(8, 12]            0         0         0         0         0         0         0         0         0
(12, 16]           0         0         0         0         0         0         0         0         1
(16, 20]           0         0         0         0         0         0         0         0         1
(20, 24]           0         0         0         0         0         0         0         0         1
(24, 28]           0         0         0         0         0         0         0         2         0

Leave a ReplyCancel reply