I can’t figure out how to get python to count each of the instances of each keyword and put them in a list "count_of_occ". It will output to one row.
from collections import Counter
import pandas as pd
import bs4
import requests
keywords = ["Goulds", "Pump", "http"]
df = pd.DataFrame({'Keyword': keywords})
URL = ["https://www.gouldspumps.com/en-US/Home/"]
for r in URL:
ctnt=requests.get(r, verify=False)
output_content = str(ctnt.content)
count_of_occ = [0 for i in keywords]
for num in count_of_occ:
for i in keywords:
if i in output_content:
count_of_occ[num] += output_content.count(i)
df['Occurence'] = pd.Series(count_of_occ, index=df.index)
print(df)
Output:
Keyword Occurence
0 Goulds 429
1 Pump 0
2 http 0
>Solution :
Here’s an example that demonstrates the issue:
count_of_occ = [0, 0, 0]
for num in count_of_occ:
count_of_occ[num] += 5
print(count_of_occ)
you might expect the above to output [5, 5, 5]
, but it’ll actually print [15, 0, 0]
. The for-loop has 3 iterations, but since it’s iterating through [0, 0, 0]
, num
is always 0
so count_of_occ[num]
is always the first list item.
Here’s a script that accomplishes what you’re looking for:
import pandas as pd
import requests
keywords = ["Goulds", "Pump", "http"]
df = pd.DataFrame({'Keyword': keywords})
URL = ["https://www.gouldspumps.com/en-US/Home/"]
for r in URL:
ctnt = requests.get(r, verify=False)
output_content = str(ctnt.content)
df['Occurence'] = pd.Series(output_content.count(i) for i in keywords)
print(df)
Keyword Occurence
0 Goulds 47
1 Pump 81
2 http 15
I used a generator expression in place of the count_of_occ
list, but you can also do the following:
count_of_occ = [0 for i in keywords]
for i in range(len(count_of_occ)):
if keywords[i] in output_content:
count_of_occ[i] += output_content.count(keywords[i])
df['Occurence'] = pd.Series(count_of_occ, index=df.index)