Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: string not splitting correctly at "|||" substring

I have a column in Pandas DataFrame that stores long strings, in which different chunks of information are separated by a "|||".
This is an example:

"intermediation|"mechanical turk"|precarious "public policy" ||| intermediation|"mechanical turk"|precarious high-level

I need to split this column into multiple columns, each column containing the string between the separators "|||".

However, while running the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['query_ids'].str.split('|||', n=5, expand = True)

What I get, however, are splits done for every single character, like this:

     0   1  2  3  4                                                  5
0        "  r  e  g  ulatory capture"|"political lobbying" policy-m...

I suspect it’s because "|" is a Python operator, but I cannot think of a suitable workaround.

>Solution :

You need to escape |:

df['query_ids'].str.split('\|\|\|', n=5, expand=True)

or to pass regex=False:

df['query_ids'].str.split('|||', n=5, expand=True, regex=False)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading