How to scrape a specific word from a div of pages on wikidata?

May 11, 2023

I am trying to extract the word ‘human’ from the info of persons I search on wikidata.org.

For example, for the page https://www.wikidata.org/wiki/Q5284, the word human exists in the following div:

<div class="wikibase-snakview-value wikibase-snakview-variation-valuesnak"><a title="Q5" href="/wiki/Q5">human</a></div>

I am using the following code, which produces the whole line above, not only the word ‘human’ :

import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.wikidata.org/wiki/Q1124')
soup = BeautifulSoup(page.text,'html.parser')
x = soup.find("div", attrs={"class":"wikibase-snakview-value wikibase-snakview-variation-valuesnak",  "data-value": False},)

And if I use the method .get('href') on x I get None.
What should I do to get the word ‘human’ only as the outcome?

>Solution :

you should try like this:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.wikidata.org/wiki/Q1124')
soup = BeautifulSoup(page.text, 'html.parser')

x = soup.find("div", attrs={"class":"wikibase-snakview-value wikibase-snakview-variation-valuesnak"})

# to get the word 'human'
print(x.text)
# to get the 'href'
print(x.find('a').get('href'))

output: