Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas extract substring between two characters

I have a dataframe column in pandas which contains a long piece of text

            Value_column
Car:"Ford",Colour:"Black", Price:2000, 

I’d like to split this into three columns

So it would look like

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Car     Colour    Price
Ford    Black     2000

I’ve been able to do it for the first split using

df['Car']=df['Value_column].str.split("Car",expand=True).iloc[:,1:]
df['Car']=df['Car'].str[0:5]

But can’t figure out a neat way of doing it for all values. The tricky bit I’m finding is telling the code when to end. It only works for Ford because I know Ford is 4 letters long

>Solution :

Assuming your key/values don’t contain :/, characters, you can extractall and pivot:

out = (df['Value_column']
 .str.extractall(r'\s*([^:,]+):\s*\"?([^:,]+?)\"?\s*(?:,|\s*$)')
 .droplevel('match').pivot(columns=0, values=1)
 .rename_axis(columns=None)
)

Output:

   Price   Car Colour
0   2000  Ford  Black

regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading