Home Split text at specific character in BeautifulSoup

Questions

Split text at specific character in BeautifulSoup

July 13, 2023

I am brand new to Python and BeautifulSoup so please forgive the lack of proper vocabulary in my question.

I am trying to extract the list from this webpage: http://spajournalism.com/membership/ – I want all the publications that are asssociated with a specific university. I’d like to end up with a list of dictionaries like:
[{publication_url: url1, publication_name: name1, uni: uni1}, {publication_url: url2, publication_name: name2, uni: uni2}]

Unfortunately the content on the webpage is quite messy, HTML-wise and it’s proving tricky. My code is currently:

import lxml.etree
import requests
from bs4 import BeautifulSoup

url = "http://spajournalism.com/membership/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "lxml")

section = soup.find("div", "entry-content clearfix")
links = section.find_all("a")

#list = []

#for link in links:
#    publication = {
#        "Link" : link.get("href"),
#        "Publication" : link.parent.text
#    }

for link in links:
    print("Link: ", link.get("href"), "Text: ", link.parent.text)

This returns a list of the following nature:

Link:  http://www.swanseastudentmedia.com/waterfront/ Text:  The Waterfront – Swansea University
Link:  https://www.seren.bangor.ac.uk/ Text:  Y Seren – Bangor University
...etc

I would like to, instead of getting all the text in one go with link.parent.text, split it at the hyphen ( – ), and get something more like:

Link:  http://www.swanseastudentmedia.com/waterfront/ Text:  The Waterfront University: Swansea University
Link:  https://www.seren.bangor.ac.uk/ Text:  Y Seren University: Bangor University
...etc

I have tried something like the following:

for link in links:
    text = link.parent.text
    linktext = link.string
    text.replace(linktext, " ") # Replace the redundant link text with nothing

    print("Link: ", link.get("href"), "Publication: ", linktext, "University: ", text)

But the replacing the redundant text with nothing doesn’t seem to work because what I get is:

Link:  http://www.swanseastudentmedia.com/waterfront/ Publication:  The Waterfront University:  The Waterfront – Swansea University
Link:  https://www.seren.bangor.ac.uk/ Publication:  Y Seren University:  Y Seren – Bangor University
...etc

Is there a way of doing this? Any searches I do are full of results referring to something called Dash which isn’t relevant to me. Thanks 🙂

>Solution :

replace() function dont change the text_variable, it returns a new string

Change

text.replace(linktext, " ")

text = text.replace(linktext, " ").split("–", 1)[-1].strip()

beautifulsoup

byMR

Published July 13, 2023

Add a comment

Listen to all notifications like `LISTEN *`

byMR

July 13, 2023

Questions

Cumulative sum per group in tidyverse R

byMR

July 13, 2023

Questions

Write a Java program to Replace Vowel letter with Capital the given String . Example -" Engineer"

byMR

July 13, 2023

Questions

Python – Converting a dataframe with columns x, y and a variable "A" into a netCDF file

byMR

July 13, 2023

Questions

Data frame losing ability to call method when passed as an argument

byMR

July 13, 2023

Questions

How can I call a C# function that takes an Action<T> from F#?

byMR

July 13, 2023

Split text at specific character in BeautifulSoup

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Listen to all notifications like `LISTEN *`

Cumulative sum per group in tidyverse R

Write a Java program to Replace Vowel letter with Capital the given String . Example -" Engineer"

Python – Converting a dataframe with columns x, y and a variable "A" into a netCDF file

Data frame losing ability to call method when passed as an argument

How can I call a C# function that takes an Action<T> from F#?

Keep Up to Date with the Most Important News

Split text at specific character in BeautifulSoup

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Listen to all notifications like `LISTEN *`

Cumulative sum per group in tidyverse R

Write a Java program to Replace Vowel letter with Capital the given String . Example -" Engineer"

Python – Converting a dataframe with columns x, y and a variable "A" into a netCDF file

Data frame losing ability to call method when passed as an argument

How can I call a C# function that takes an Action<T> from F#?

Discover more from Dev solutions