Home Pandas extract substring between two characters

Questions

Pandas extract substring between two characters

October 18, 2023

I have a dataframe column in pandas which contains a long piece of text

            Value_column
Car:"Ford",Colour:"Black", Price:2000,

I’d like to split this into three columns

So it would look like

Car     Colour    Price
Ford    Black     2000

I’ve been able to do it for the first split using

df['Car']=df['Value_column].str.split("Car",expand=True).iloc[:,1:]
df['Car']=df['Car'].str[0:5]

But can’t figure out a neat way of doing it for all values. The tricky bit I’m finding is telling the code when to end. It only works for Ford because I know Ford is 4 letters long

>Solution :

Assuming your key/values don’t contain :/, characters, you can extractall and pivot:

out = (df['Value_column']
 .str.extractall(r'\s*([^:,]+):\s*\"?([^:,]+?)\"?\s*(?:,|\s*$)')
 .droplevel('match').pivot(columns=0, values=1)
 .rename_axis(columns=None)
)

Output:

   Price   Car Colour
0   2000  Ford  Black

regex demo

pandas

byMR

Published October 18, 2023

Add a comment

Output data without duplicates by maximum date

byMR

October 18, 2023

Questions

work with the value from tkmessagebox in R

byMR

October 18, 2023

Questions

Powershell Rename Long and Complex Video filenames

byMR

October 18, 2023

Questions

Array subscripting and pointer arithmetics in C give different result

byMR

October 18, 2023

Questions

For Statement with Two Conditions

byMR

October 19, 2023

Questions

Is std::istringstream::peek() supposed to set the eof flag?

byMR

October 19, 2023

Pandas extract substring between two characters

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Output data without duplicates by maximum date

work with the value from tkmessagebox in R

Powershell Rename Long and Complex Video filenames

Array subscripting and pointer arithmetics in C give different result

For Statement with Two Conditions

Is std::istringstream::peek() supposed to set the eof flag?

Keep Up to Date with the Most Important News

Pandas extract substring between two characters

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Output data without duplicates by maximum date

work with the value from tkmessagebox in R

Powershell Rename Long and Complex Video filenames

Array subscripting and pointer arithmetics in C give different result

For Statement with Two Conditions

Is std::istringstream::peek() supposed to set the eof flag?

Discover more from Dev solutions