Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get some words from a string until there is a vast pattern in pandas

I have a pandas dataframe and one of the columns is a string. I only want the first words from that column that are in front of a date (also in string form).
The problem is that I don’t know how much words there are in front of the date.

The string rows of the column looks like the following:

word1 word2 word3 02/08/2022 XXX XXX XXX
word1 04/09/2019 XXX XXX XXX
word1 word2 word3 word4 10/12/2021 XXX XXX XXX
word1 word2 30/11/2022 XXX XXX XXX

So I want only:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

word1 word2 word3
word1
word1 word2 word3 word4
word1 word2

The ‘XXX’ stands for words of which I do not know in advance how many there are.

Can someone help me with this problem?

>Solution :

We can use Series.str.split with a regex pattern

s = pd.Series(["word1 word2 word3 02/08/2022 XXX XXX XXX", "word1 04/09/2019 XXX XXX XXX"])

s.str.split("\d{2}/\d{2}/\d{4}").str[0]

0    word1 word2 word3 
1                word1 
dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading