Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find "a" element in BS4 by partial class name not working?

I want to find an a element in a soup object by a substring present in its class name. This particular element will always have JobTitle inside the class name, with random preceding and trailing characters, so I need to locate it by its substring of JobTitle.

You can see the element here:

htmlToParse

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

It’s safe to assume there is only 1 a element to find, so using find should work, however my attempts (there have been more than the 2 shown below) have not worked. I’ve also included the top elements in case it’s relevant for location for some reason.

I’m on Windows 10, Python 3.10.5, and BS4 4.11.1.

I’ve created a reproducible example below (I thought the regex way would have worked, but I guess not):

import re
from bs4 import BeautifulSoup

# Parse this HTML, getting the only a['href'] in it (line 22)
html_to_parse = """
    <li>
        <div class="cardOutline tapItem fs-unmask result job_5ef6bf779263a83c sponsoredJob resultWithShelf sponTapItem desktop vjs-highlight">
            <div class="slider_container css-g7s71f eu4oa1w0">
            <div class="slider_list css-kyg8or eu4oa1w0">
                <div class="slider_item css-kyg8or eu4oa1w0">
                <div class="job_seen_beacon">
                    <div class="fe_logo">
                    <img alt="CyberCoders logo" class="feLogoImg desktop" src="https://d2q79iu7y748jz.cloudfront.net/s/_squarelogo/256x256/f0b43dcaa7850e2110bc8847ebad087b" />
                    </div>
                    <table cellpadding="0" cellspacing="0" class="jobCard_mainContent big6_visualChanges" role="presentation">
                    <tbody>
                        <tr>
                        <td class="resultContent">
                            <div class="css-1xpvg2o e37uo190">
                            <h2 class="jobTitle jobTitle-newJob css-bdjp2m eu4oa1w0" tabindex="-1">
                                <a aria-label="full details of REMOTE Senior Python Developer" class="jcs-JobTitle css-jspxzf eu4oa1w0" data-ci="385558680" data-empn="8690912762161442" data-hide-spinner="true" data-hiring-event="false" data-jk="5ef6bf779263a83c" data-mobtk="1g9u19rmn2ea6000" data-tu="https://jsv3.recruitics.com/partner/a51b8de1-f7bf-11e7-9edd-d951492604d9.gif?client=521&amp;rx_c=&amp;rx_campaign=indeed16&amp;rx_group=110383&amp;rx_source=Indeed&amp;job=KE2-168714218&amp;rx_r=none&amp;rx_ts=20220808T034442Z&amp;rx_pre=1&amp;indeed=sp" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0CpFJQzrgRR8WqXWK1qKKEqALWJw739KlKqr2H-MSI4eoBlI4EFrmor2FYZMP3muM35UEpv7D8dnBwRFuIf8XmtgYykaU5Nl3fSsXZ8xXiGdq3dZVwYJYR2-iS1SqyS7j4jGQ4Clod3n72L285Zn7LuBKMjFoBPi4tB5X2mdRnx-UikeGviwDC-ahkoLgSBwNaEmvShQxaFt_IoqJP6OlMtTd7XlgeNdWJKY9Ph9u8n4tcsN_tCjwIc3RJRtS1O7U0xcsVy5Gi1JBR1W7vmqcg5n4WW1R_JnTwQQ8LVnUF3sDzT4IWevccQb289ocL5T4jSfRi7fZ6z14jrR6bKwoffT6ZMypqw4pXgZ0uvKv2v9m3vJu_e5Qit1D77G1lNCk9jWiUHjWcTSYwhhwNoRzjAwd4kvmzeoMJeUG0gbTDrXFf3V2uJQwjZhTul-nbfNeFPRX6vIb4jgiTn4h3JVq-zw0woq3hTrLq1z9Xpocf5lIGs9U7WJnZM-Mh7QugzLk1yM3prCk7tQYRl3aKrDdTsOdbl5Afs1DkatDI7TgQgFrr5Iauhiv7I9Ss-fzPJvezhlYR4hjkkmSSAKr3Esz06bh5GlZKFONpq1I0IG5aejSdS_kJUhnQ1D4Uj4x7X_mBBN-fjQmL_CdyWM1FzNNK0cZwdLjKL-d8UK1xPx3MS-O-WxVGaMq0rn4lyXgOx7op9EHQ2Qdxy9Dbtg6GNYg5qBv0iDURQqi7_MNiEBD-AaEyqMF3riCBJ4wQiVaMjSTiH_DTyBIsYc0UsjRGG4a949oMHZ8yL4mGg57QUvvn5M_urCwCtQTuyWZBzJhWFmdtcPKCn7LpvKTFGQRUUjsr6mMFTQpA0oCYSO7E-w2Kjj0loPccA9hul3tEwQm1Eh58zHI7lJO77kseFQND7Zm9OMz19oN45mvwlEgHBEj4YcENhG6wdB6M5agUoyyPm8fLCTOejStoecXYnYizm2tGFLfqNnV-XtyDZNV_sQKQ2TQ==&amp;xkcb=SoD0-_M3b-KooEWCyR0LbzkdCdPP&amp;p=0&amp;fvj=0&amp;vjs=3" id="sj_5ef6bf779263a83c" role="button" target="_blank">
                                <span id="jobTitle-5ef6bf779263a83c" title="REMOTE Senior Python Developer">REMOTE Senior Python Developer</span>
                                </a>
                            </h2>
                            </div>
                        </td>
                        </tr>
                    </tbody>
                    </table>
                </div>
                </div>
            </div>
            </div>
        </div>
    </li>
"""

# Soupify it
soup = BeautifulSoup(html_to_parse, "html.parser")

# Start by making sure "find_all("a")" works
all_links = soup.find_all("a")
print(all_links)
# Good.

# Attempt 1
job_url = soup.find('a[class*="JobTitle"]').a['href']
print(job_url)
# Nope.

# Attempt 2
job_url = soup.find("a", {"class": re.compile("^.*jobTitle.*")}).a['href']
print(job_url)
# Nope...

>Solution :

To find an element with partial class name you need to use select, not find. The will give you the <a> tag, the href will be in it

job_url = soup.select_one('a[class*="JobTitle"]')['href']
print(job_url)
# /pagead/clk?mo=r&ad=-6NYlbfkN0CpFJQzrgRR8WqXWK1qKKEqALWJw739KlKqr2H-MSI4eoBlI4EFrmor2FYZMP3muM35UEpv7D8dnBwRFuIf8XmtgYykaU5Nl3fSsXZ8xXiGdq3dZVwYJYR2-iS1SqyS7j4jGQ4Clod3n72L285Zn7LuBKMjFoBPi4tB5X2mdRnx-UikeGviwDC-ahkoLgSBwNaEmvShQxaFt_IoqJP6OlMtTd7XlgeNdWJKY9Ph9u8n4tcsN_tCjwIc3RJRtS1O7U0xcsVy5Gi1JBR1W7vmqcg5n4WW1R_JnTwQQ8LVnUF3sDzT4IWevccQb289ocL5T4jSfRi7fZ6z14jrR6bKwoffT6ZMypqw4pXgZ0uvKv2v9m3vJu_e5Qit1D77G1lNCk9jWiUHjWcTSYwhhwNoRzjAwd4kvmzeoMJeUG0gbTDrXFf3V2uJQwjZhTul-nbfNeFPRX6vIb4jgiTn4h3JVq-zw0woq3hTrLq1z9Xpocf5lIGs9U7WJnZM-Mh7QugzLk1yM3prCk7tQYRl3aKrDdTsOdbl5Afs1DkatDI7TgQgFrr5Iauhiv7I9Ss-fzPJvezhlYR4hjkkmSSAKr3Esz06bh5GlZKFONpq1I0IG5aejSdS_kJUhnQ1D4Uj4x7X_mBBN-fjQmL_CdyWM1FzNNK0cZwdLjKL-d8UK1xPx3MS-O-WxVGaMq0rn4lyXgOx7op9EHQ2Qdxy9Dbtg6GNYg5qBv0iDURQqi7_MNiEBD-AaEyqMF3riCBJ4wQiVaMjSTiH_DTyBIsYc0UsjRGG4a949oMHZ8yL4mGg57QUvvn5M_urCwCtQTuyWZBzJhWFmdtcPKCn7LpvKTFGQRUUjsr6mMFTQpA0oCYSO7E-w2Kjj0loPccA9hul3tEwQm1Eh58zHI7lJO77kseFQND7Zm9OMz19oN45mvwlEgHBEj4YcENhG6wdB6M5agUoyyPm8fLCTOejStoecXYnYizm2tGFLfqNnV-XtyDZNV_sQKQ2TQ==&xkcb=SoD0-_M3b-KooEWCyR0LbzkdCdPP&p=0&fvj=0&vjs=3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading