Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get a particular text from a column in a dataframe

I have a dataset where I have a dataframe[‘Title’] column with car brand and accessories information.

enter image description here

I want to 2 new columns dataframe[‘Brand’] and dataframe[‘model’], where I want to get the brand name of the vehicle and the model of the vehicle. Like Mahindra brand and XUV300 as per the first record. and want brand –> universal and model –> NaN if the record is like the second entry –> Blaupunkt Colombo 130 BT.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What I tried:-

brand = []
for i in vehicle_make:
    for j in range(len(df2['Title'])):
        text = df2['Title'][j].lstrip().lower()
        print(text)
        if i in text:
            df2['brand'][j] = i
            print("yes")
        else:
            df2['brand'][j] = 'Unversal'
            print('No')

where vehicle_make contains brand names.

['ford',
 'honda',
 'hyundai',
 'Kia',
 'mahindra',
 'maruti',
 'mg',
 'nissan',
 'renault',
 'skoda',
 'tata',
 'toyota',
 'volkswagen']

which I scrapped from the same website.

The above code is working but it’s not picking all the values.

mahindra xuv 300 led door foot step sill plate mirror finish black glossy
No
blaupunkt colombo 130 bt
No
nissan terrano mud flap /mud guard
No
mg hector plus 4d boss leatherite car floor mat black( without grass mat)
No
ford endeavour body protection waterproof car cover (grey)
yes
starid wiper blade framless for volkswagen polo (size 24' and 16'' ) black
No
mahindra tuv300 rain door visor without chrome line (2015-2017)
No

This is the output I am getting. What is wrong here?

>Solution :

This answer assumes that all text following a brand name would correspond to the model. We can form a regex alternation of brand names, and then use str.extract with this alternation.

brands = ['ford', 'honda', 'hyundai', 'Kia', 'mahindra', 'maruti', 'mg', 'nissan', 'renault', 'skoda', 'tata', 'toyota', 'volkswagen']
regex = r'\b(' + r'|'.join(brands) + r') (.*)'
df2[["Title", "Brand"]] = df2["Title"].str.extract(regex, flags=re.I)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading