Advertisements
Let’s say I have a dataframe with two timeseries variables as follows:
Temperature | Humidity |
---|---|
17.6 | 88 |
22.1 | 81 |
13.6 | 88 |
26.4 | 71 |
25.6 | 72 |
Here is the pd.DataFrame
for it:
data = {
'Temperature': [17.6, 22.1, 13.6, 26.4, 25.6],
'Humidity': [88, 81, 88, 71, 72]
}
df = pd.DataFrame(data)
I would like to obtain a table, which has these two variables, binned as the axes, and the values being the frequency (nominal or relative) of the combined condition.
For example, the end output would be formatted with binned temperatures as columns and binned humidity as the index:
0-4 | 4-8 | 8-12 | 12-16 | 16-20 | 20-24 | 24-28 | 28-32 | |
---|---|---|---|---|---|---|---|---|
0-10 | ||||||||
10-20 | ||||||||
20-30 | ||||||||
30-40 | ||||||||
40-50 | ||||||||
50-60 | ||||||||
60-70 | ||||||||
70-80 | ||||||||
80-90 | ||||||||
90-100 |
I know how to bin and get the counts for one binning, but I’m really not sure how to count the combined occurrences of two binned conditions. Can anyone please point me in the right direction?
df['T_binned'] = pd.cut(df['Temperature'],bins=np.arange(0,32,4))
df['H_binned'] = pd.cut(df['Humidity'],bins=np.arange(0,100,10))
counts_T = df.groupby(by='T_binned').count()
counts_H = df.groupby(by='H_binned').count()
>Solution :
Use a crosstab
:
pd.crosstab(pd.cut(df['Temperature'],bins=np.arange(0,32,4)),
pd.cut(df['Humidity'],bins=np.arange(0,100,10)))
Or for all categories:
pd.crosstab(pd.cut(df['Temperature'],bins=np.arange(0,32,4)),
pd.cut(df['Humidity'],bins=np.arange(0,100,10)),
dropna=False
)
Output:
Humidity (0, 10] (10, 20] (20, 30] (30, 40] (40, 50] (50, 60] (60, 70] (70, 80] (80, 90]
Temperature
(0, 4] 0 0 0 0 0 0 0 0 0
(4, 8] 0 0 0 0 0 0 0 0 0
(8, 12] 0 0 0 0 0 0 0 0 0
(12, 16] 0 0 0 0 0 0 0 0 1
(16, 20] 0 0 0 0 0 0 0 0 1
(20, 24] 0 0 0 0 0 0 0 0 1
(24, 28] 0 0 0 0 0 0 0 2 0