Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How count instances of keywords in a list through pandas?

I can’t figure out how to get python to count each of the instances of each keyword and put them in a list "count_of_occ". It will output to one row.

from collections import Counter
import pandas as pd
import bs4
import requests
keywords = ["Goulds", "Pump", "http"]
df = pd.DataFrame({'Keyword': keywords})
URL = ["https://www.gouldspumps.com/en-US/Home/"]
for r in URL:
    ctnt=requests.get(r, verify=False)
output_content = str(ctnt.content)
count_of_occ = [0 for i in keywords]
for num in count_of_occ:
    for i in keywords:
        if i in output_content:
            count_of_occ[num] += output_content.count(i)
df['Occurence'] = pd.Series(count_of_occ, index=df.index)
print(df)
Output:
  Keyword  Occurence
0  Goulds        429
1    Pump          0
2    http          0

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here’s an example that demonstrates the issue:

count_of_occ = [0, 0, 0]

for num in count_of_occ:
    count_of_occ[num] += 5

print(count_of_occ)

you might expect the above to output [5, 5, 5], but it’ll actually print [15, 0, 0]. The for-loop has 3 iterations, but since it’s iterating through [0, 0, 0], num is always 0 so count_of_occ[num] is always the first list item.

Here’s a script that accomplishes what you’re looking for:

import pandas as pd
import requests

keywords = ["Goulds", "Pump", "http"]
df = pd.DataFrame({'Keyword': keywords})
URL = ["https://www.gouldspumps.com/en-US/Home/"]
for r in URL:
    ctnt = requests.get(r, verify=False)
output_content = str(ctnt.content)
df['Occurence'] = pd.Series(output_content.count(i) for i in keywords)
print(df)
  Keyword  Occurence
0  Goulds         47
1    Pump         81
2    http         15

I used a generator expression in place of the count_of_occ list, but you can also do the following:

count_of_occ = [0 for i in keywords]
for i in range(len(count_of_occ)):
    if keywords[i] in output_content:
        count_of_occ[i] += output_content.count(keywords[i])
df['Occurence'] = pd.Series(count_of_occ, index=df.index)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading