I am trying to extract the word ‘human’ from the info of persons I search on wikidata.org.
For example, for the page https://www.wikidata.org/wiki/Q5284, the word human exists in the following div:
<div class="wikibase-snakview-value wikibase-snakview-variation-valuesnak"><a title="Q5" href="/wiki/Q5">human</a></div>
I am using the following code, which produces the whole line above, not only the word ‘human’ :
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.wikidata.org/wiki/Q1124')
soup = BeautifulSoup(page.text,'html.parser')
x = soup.find("div", attrs={"class":"wikibase-snakview-value wikibase-snakview-variation-valuesnak", "data-value": False},)
And if I use the method .get('href') on x I get None.
What should I do to get the word ‘human’ only as the outcome?
>Solution :
you should try like this:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.wikidata.org/wiki/Q1124')
soup = BeautifulSoup(page.text, 'html.parser')
x = soup.find("div", attrs={"class":"wikibase-snakview-value wikibase-snakview-variation-valuesnak"})
# to get the word 'human'
print(x.text)
# to get the 'href'
print(x.find('a').get('href'))
output:
human
/wiki/Q5