Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Web Scraper sectioned in different files

I have been working on a Python scraper for a while.
I want to save the information I get in different files. URLs must be in one file and captions must be in another file.

While working with URLs there is no issue, but when I try to scrape the names of the blog I’m searching for, I get this result:

w
a
t
a
s
h
i
n
o
s
e
k
a
i
s
w
o
r
l
d
v
-
a
-
p
-
o
-
r
-
s
-
m
-
u
-
t
b
l
a
c
k
e
n
e
d
d
e
a
t
h
e
y
e
5
h
i
n
y
8
l
a
z
e
2
o
m
b
i
e
p
o
r
y
g
o
n
-
d
i
g
i
t
a
l
v
a
p
o
r
w
a
v
e
b
o
m
b
s
u
b
t
l
e
a
n
i
m
e
v
a
p
o
r
w
a
v
e
c
o
r
p
f
i
r
m
i
m
a
g
e

I have identified the problem and I think it is related to the ‘\n’, but I have not been able to find a solution.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is my code:

from bs4 import BeautifulSoup

search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")

articles = soup.find_all("article", class_="FtjPK")

data = {}
for article in articles:
    try:
        source = article.find("div", class_="vGkyT").text
        for imgvar in article.find_all("img", alt="Image"):
            data.setdefault(source, []).extend(
                [
                    i.replace("500w", "").strip()
                    for i in imgvar["srcset"].split(",")
                    if "500w" in i
                ]
            )
    except AttributeError:
        continue


archivo = open ("Sites.txt", "w")
for source, image_urls in data.items():
    for url in image_urls:
        archivo.write(url + '\n')
archivo.close()


archivo = open ("Source.txt", "w")
for source, image_urls in data.items():
    for sources in source:
        archivo.write(sources + '\n')
archivo.close()

>Solution :

Change the last loop to:

archivo = open("Source.txt", "w")
for source in data:
    archivo.write(source + "\n")
archivo.close()

Then the content of Source.txt will be:

harshvardhan25
mikeahrens
amazinglybeautifulphotography
landscaperrosebay
danielapelli
sahrish-acrylic-painting
sweetd3lights
pensamentsisomnis
pics-bae
oneshotolive
scattopermestesso
huariqueje

Or using with:

with open("Source.txt", "w") as archivo:
    archivo.write("\n".join(data))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading