I would like to create a cat_month column in my expeditions dataframe. This column would contain the mountain category (small, medium or large) and I would like to assign a category according to the height contained in the highpoint_metres column (with quartiles: small = height lower than the first quartile) but I can’t manage to do it.
Data:
import pandas as pd
expeditions = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv")
What I’ve tried :
peaks[cat_monts] =
for peak_id in expeditions :
if "highpoint_metres" < 6226.5 : #1er quartile
return "petite montagne"
elif 6226.5<"highpoint_metres" <7031.25:
return "moyenne montagne"
else :
return "grande montagne"
>Solution :
Use np.select which accepts a list of conditions, list of their corresponding values, and a default ("else") value.
The conditions are evaluated in order, so you can use this:
conditions = {
'moyenne montagne': expeditions['highpoint_metres'] < 7031.25,
'petite montagne': expeditions['highpoint_metres'] < 6226.5,
}
expeditions['cat_month'] = np.select(conditions.values(), conditions.keys(), default='grande montagne')
Output:
expedition_id ... highpoint_metres ... cat_month
0 ANN260101 ... 7937.0 ... grande montagne
1 ANN269301 ... 7937.0 ... grande montagne
2 ANN273101 ... 7937.0 ... grande montagne
3 ANN278301 ... 7000.0 ... moyenne montagne
4 ANN279301 ... 7160.0 ... grande montagne
... ... ... ... ... ...
10359 PUMO19101 ... 7138.0 ... grande montagne
10360 PUMO19102 ... 7138.0 ... grande montagne
10361 PUTH19101 ... 6350.0 ... moyenne montagne
10362 RATC19101 ... 6600.0 ... moyenne montagne
10363 SANK19101 ... 6452.0 ... moyenne montagne