Create new columns in pandas df by grouping and performing operations on an existing column

Advertisements

I have a dataframe that looks like this (Minimal Reproducible Example)

thermometers = ['T-10000_0001', 'T-10000_0002','T-10000_0003', 'T-10000_0004', 
                'T-10001_0001', 'T-10001_0002', 'T-10001_0003', 'T-10001_0004', 
                'T-10002_0001', 'T-10002_0003', 'T-10002_0003', 'T-10002_0004']

temperatures = [15.1, 14.9, 12.7, 10.8,
               19.8, 18.3, 17.7, 18.1,
               20.0, 16.4, 17.6, 19.3]

df_set = {'thermometers': thermometers,
         'Temperatures': temperatures}

df = pd.DataFrame(df_set)

Index	Thermometer	Temperature
0	T-10000_0001	14.9
1	T-10000_0002	12.7
2	T-10000_0003	12.7
3	T-10000_0004	10.8
4	T-10001_0001	19.8
5	T-10001_0002	18.3
6	T-10001_0003	17.7
7	T-10001_0004	18.1
8	T-10002_0001	20.0
9	T-10002_0002	16.4
10	T-10002_0003	17.6
11	T-10002_0004	19.3

I am trying to group the thermometers (i.e ‘T-10000’, ‘T-10001’, ‘T-10002’), and create new columns with the min, max and average of each thermometer reading. So my final data frame would look like this

Index	Thermometer	min_temp	average_temp	max_temp
0	T-10000	10.8	12.8	14.9
1	T-10001	17.7	18.5	19.8
2	T-10002	16.4	18.3	20.0

I tried creating a separate function which I think requires regular expression, but I’m unable to figure out how to go about it. Any help will be much appreciated.

>Solution :

Use groupby by splitting with your delimiter _. Then, just aggregate with whatever functions you need.

>>> df.groupby(df['thermometers']\
               .str.split('_').  \
               .str.get(0)).agg(['min', 'mean', 'max'])

                      min    mean   max
thermometers                           
T-10000              10.8  13.375  15.1
T-10001              17.7  18.475  19.8
T-10002              16.4  18.325  20.0

Leave a ReplyCancel reply

Exit mobile version

%%footer%%