I’m working with a dataframe where one of the columns is like this:
Rating
4.8 out of 5 stars
4.0 out of 5 stars
4.5 out of 5 stars
and I want to slice this data keeping only the first number, e.g.
Rating
4.8
4.0
4.5
how can I solve it?
>Solution :
To extract a field from a string (or categorical) column’s text, use pandas Series.str.extract with a regex:
df['Rating'].str.extract('([1-5]\.[0-9])')
0
0 4.8
1 4.0
2 4.5
df = pd.DataFrame({'Rating': ['4.8 out of 5 stars', '4.0 out of 5 stars', '4.5 out of 5 stars']}, dtype='category')
You can tweak that regex if you need, please see the manpage. It assumes all ratings are a decimal (not integer), and have one decimal place.