I have a pandas dataframe and one of the columns is a string. I only want the first words from that column that are in front of a date (also in string form).
The problem is that I don’t know how much words there are in front of the date.
The string rows of the column looks like the following:
word1 word2 word3 02/08/2022 XXX XXX XXX
word1 04/09/2019 XXX XXX XXX
word1 word2 word3 word4 10/12/2021 XXX XXX XXX
word1 word2 30/11/2022 XXX XXX XXX
So I want only:
word1 word2 word3
word1
word1 word2 word3 word4
word1 word2
The ‘XXX’ stands for words of which I do not know in advance how many there are.
Can someone help me with this problem?
>Solution :
We can use Series.str.split with a regex pattern
s = pd.Series(["word1 word2 word3 02/08/2022 XXX XXX XXX", "word1 04/09/2019 XXX XXX XXX"])
s.str.split("\d{2}/\d{2}/\d{4}").str[0]
0 word1 word2 word3
1 word1
dtype: object