I have the following table
df = pd.DataFrame({'favs':{0:'chicken_panfry1_t360_ketchup',
1:'chicken_bake2_t450_out_bbq',
2:'chicken_boiled2_season_gravy'}})
That looks like this
favs
0 chicken_panfry1_t360_ketchup
1 chicken_bake2_t450_out_bbq
2 chicken_boiled2_season_gravy
I would like to split the column at the last underscore to create 2 new columns that look like this.
favs recipe sauce
0 chicken_panfry1_t360_ketchup chicken_panfry1_t360 ketchup
1 chicken_bake2_t450_out_bbq chicken_bake2_t450_out bbq
2 chicken_boiled2_season_gravy chicken_boiled2_season gravy
This is what I’ve tried
df[['recipe','sauce']]=df['favs'].str.split(r'.*_', expand=True)
This creates the sauce column correctly but the recipe column is blank. It looks like this. Unsure of how to correct it.
favs recipe sauce
0 chicken_panfry1_t360_ketchup ketchup
1 chicken_bake2_t450_out_bbq bbq
2 chicken_boiled2_season_gravy gravy
>Solution :
You need Series.str.extract with the (.*)_(.*) regex pattern:
df[['recipe','sauce']]=df['favs'].str.extract(r'(.*)_(.*)', expand=True)
See the regex demo.
The (.*)_(.*) regex matches and captures the part before the last _ into Group 1 (with the first (.*)) and the part after last _ into the second column (with the second (.*)).