Home Web Scraper sectioned in different files

Questions

Web Scraper sectioned in different files

May 24, 2022

I have been working on a Python scraper for a while.
I want to save the information I get in different files. URLs must be in one file and captions must be in another file.

While working with URLs there is no issue, but when I try to scrape the names of the blog I’m searching for, I get this result:

w
a
t
a
s
h
i
n
o
s
e
k
a
i
s
w
o
r
l
d
v
-
a
-
p
-
o
-
r
-
s
-
m
-
u
-
t
b
l
a
c
k
e
n
e
d
d
e
a
t
h
e
y
e
5
h
i
n
y
8
l
a
z
e
2
o
m
b
i
e
p
o
r
y
g
o
n
-
d
i
g
i
t
a
l
v
a
p
o
r
w
a
v
e
b
o
m
b
s
u
b
t
l
e
a
n
i
m
e
v
a
p
o
r
w
a
v
e
c
o
r
p
f
i
r
m
i
m
a
g
e

I have identified the problem and I think it is related to the ‘\n’, but I have not been able to find a solution.

This is my code:

from bs4 import BeautifulSoup

search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")

articles = soup.find_all("article", class_="FtjPK")

data = {}
for article in articles:
    try:
        source = article.find("div", class_="vGkyT").text
        for imgvar in article.find_all("img", alt="Image"):
            data.setdefault(source, []).extend(
                [
                    i.replace("500w", "").strip()
                    for i in imgvar["srcset"].split(",")
                    if "500w" in i
                ]
            )
    except AttributeError:
        continue


archivo = open ("Sites.txt", "w")
for source, image_urls in data.items():
    for url in image_urls:
        archivo.write(url + '\n')
archivo.close()


archivo = open ("Source.txt", "w")
for source, image_urls in data.items():
    for sources in source:
        archivo.write(sources + '\n')
archivo.close()

>Solution :

Change the last loop to:

archivo = open("Source.txt", "w")
for source in data:
    archivo.write(source + "\n")
archivo.close()

Then the content of Source.txt will be:

harshvardhan25
mikeahrens
amazinglybeautifulphotography
landscaperrosebay
danielapelli
sahrish-acrylic-painting
sweetd3lights
pensamentsisomnis
pics-bae
oneshotolive
scattopermestesso
huariqueje

Or using with:

with open("Source.txt", "w") as archivo:
    archivo.write("\n".join(data))

byMR

Published May 24, 2022

Add a comment

how to set the value of an array inside a loop?

byMR

May 24, 2022

Questions

How to check if string exists within a string?

byMR

May 24, 2022

Questions

Pandas find matching sub-strings and split on them

byMR

May 24, 2022

Questions

How to get Year Mont and Quarter from time series in python

byMR

May 24, 2022

Questions

Trying to get full results using DayJs

byMR

May 24, 2022

Questions

React state doesn't update immediately with useState

byMR

May 24, 2022

Web Scraper sectioned in different files

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

how to set the value of an array inside a loop?

How to check if string exists within a string?

Pandas find matching sub-strings and split on them

How to get Year Mont and Quarter from time series in python

Trying to get full results using DayJs

React state doesn't update immediately with useState

Keep Up to Date with the Most Important News

Web Scraper sectioned in different files

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

how to set the value of an array inside a loop?

How to check if string exists within a string?

Pandas find matching sub-strings and split on them

How to get Year Mont and Quarter from time series in python

Trying to get full results using DayJs

React state doesn't update immediately with useState

Discover more from Dev solutions