Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas make new columns from sub-string of another column

I’m trying to create new columns in pandas from sub-strings of another column.

import pandas as pd
import re

df = {'title':['Apartment 2 roomns, 40 m²', 'House 7 rooms, 183 m²', 'House 4 rooms, 93 m²', 'Apartment 12 rooms, 275 m²']} 

I’m trying with regex to capture groups:

df['Name'] = df.title.str.extract(r'(^[a-zA-Z]+)', expand=True) 

This one I got a good result. But I need a column with the number of rooms (without the word "rooms") and another column with the size without "m²". I tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['Rooms'] = df.title.str.replace(r'(^[0-9]+)\s(rooms)', r'\1') #to capture only the first group, which is the number

df['Size'] = df.title.str.replace(r'(^[0-9]+)\s(m²)', r'\1') #to capture only the first group, which is the number

My output:

   Name      Rooms                         Size
0  Apartment Apartment 2 roomns, 40 m²     Apartment 2 roomns, 40 m²
1  House     House 7 rooms, 183 m²         House 7 rooms, 183 m²
2  House     House 4 rooms, 93 m²          House 4 rooms, 93 m²
3  Apartment Apartment 12 rooms, 275 m²    Apartment 12 rooms, 275 m²

Good output:

   Name      Rooms Size
0  Apartment 2     40
1  House     7     183
2  House     4     93
3  Apartment 12    275

>Solution :

You can use

df["Rooms"] = df["title"].str.extract(r'(\d+)\s*room', expand=False)
df['Size'] = df["title"].str.extract(r'(\d+(?:\.\d+)?)\s*m²', expand=False)

Output:

>>> df
                        title Rooms Size
0   Apartment 2 roomns, 40 m²     2   40
1       House 7 rooms, 183 m²     7  183
2        House 4 rooms, 93 m²     4   93
3  Apartment 12 rooms, 275 m²    12  275

The (\d+)\s*room regex matches and captures into Group 1 one or more digits, and then just matches zero or more whitespaces (\s*) and then a room string.

The (\d+(?:\.\d+)?)\s*m² regex matches and captured one or more digits, and an optional string of a . and one or more digits, and then matches zero or more whitespaces and then a string.

See regex #1 demo and the regex #2 demo.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading