I want to write a program that counts the likes of a YouTube channel.
This is my code.
import re
import requests
from bs4 import BeautifulSoup
r = requests.get("https://filmot.com/channel/UCX6OQ3DkcsbYNE6H8uQQuVA")
soup = BeautifulSoup(r.text , "html.parser")
val=soup.find_all("span",attrs={"class":"badge"})
res = re.findall(r"class=\"fa fa-thumbs-up\"></i>(.*)\<" , str(val))
print(res)
But it returns the result.
['404.1K</span>, <span class="badge">Entertainment</span>, <span class="badge">8m1s</span>, <span class="badge">18 Dec 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>10M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>957.2K</span>, <span class="badge">Entertainment</span>, <span class="badge">12m9s</span>, <span class="badge">16 Dec 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>14.6M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>1.4M</span>, <span class="badge">Entertainment</span>, <span class="badge">12m4s</span>, <span class="badge">10 Dec 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>11.3M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>1.1M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>5.1K</span>, <span class="badge">Entertainment</span>, <span class="badge">11m1s</span>, <span class="badge">24 Nov 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>17.5M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>2.8M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>3.5K</span>, <span class="badge">Entertainment</span>, <span class="badge">25m41s</span>, <span class="badge">29 Oct 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>17M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>2M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>6K</span>, <span class="badge">Entertainment</span>, <span class="badge">4m55s</span>, <span class="badge">23 Oct 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>19.4M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>1.4M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>12.5K</span>, <span class="badge">Entertainment</span>, <span class="badge">15m42s</span>, <span class="badge">12 Oct 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>127.7K</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>15.3K</span>, <span class="badge">Entertainment</span>, <span class="badge">5m20s</span>, <span class="badge">26 Sep 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>7.7M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>777.1K</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>6.1K</span>, <span class="badge">Entertainment</span>, <span class="badge">8m2s</span>, <span class="badge">04 Sep 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>48.4M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>2.5M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>24.1K</span>, <span class="badge">Entertainment</span>, <span class="badge">12m40s</span>, <span class="badge">31 Aug 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>69.8M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>3M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>38.6K</span>, <span class="badge">Entertainment</span>, <span class="badge">19m25s</span>, <span class="badge">07 Aug 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>53.3M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>2.2M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>29.1K</span>, <span class="badge">Entertainment</span>, <span class="badge">16m40s</span>, <span class="badge">24 Jul 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>44.6M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>1.7M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>21.4K</span>, <span class="badge">Entertainment</span>, <span class="badge">10m45s</span>, <span class="badge">10 Jul 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>42.2M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>1.7M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>24.1K</span>, <span class="badge">Entertainment</span>, <span class="badge">11m34s</span>, <span class="badge">26 Jun 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>53.6M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>1.8M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>30.6K</span>, <span class="badge">Entertainment</span>, <span class="badge">12m33s</span>, <span class="badge">12 Jun 2021</span>, <span class="badge"><i aria-hidden="true" class="fa fa-eye"></i>49.5M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-up"></i>1.9M</span>, <span class="badge"><i aria-hidden="true" class="fa fa-thumbs-down"></i>29.2K</span>, <span ....
I tested it on the regex101.com site and the result was correct. you can see that in this image.
enter image description here
>Solution :
If you want to use regex, a positive lookbehind would be best in such case, e.g.
(?<=class=\"fa fa-thumbs-up\"></i>)[\d\w.]+ as in res = re.findall(r"(?<=class=\"fa fa-thumbs-up\"></i>)[\d\w.]+" , str(val)). The .* can be tricky since . catches any character and * catches it between zero and unlimited times (it’s an example of a greedy regex operator).