csv.writer not writing entire output to CSV file

December 4, 2022

I am attempting to scrape the artists’ Spotify streaming rankings from Kworb.net into a CSV file and I’ve nearly succeeded except I’m running into a weird issue.

The code below successfully scrapes all 10,000 of the listed artists into the console:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
    value = row.find_all('td')
    beautified_value = [dp.text.strip() for dp in value]
    print(beautified_value)

    if len(beautified_value) == 0:
        continue

    rows.append(beautified_value)

The issue arises when I use the following code to save the output to a CSV file:

with open('artist_rankings.csv', 'w', newline="") as output:
    writer = csv.writer(output)
    writer.writerow(headers)
    writer.writerows(rows)

For whatever reason, only 738 of the artists are saved to the file. Does anyone know what could be causing this?

Thanks so much for any help!

>Solution :

As an alternative approach, you might want to make your life easier next time and use pandas.

Here’s how:

import requests
import pandas as pd

source = requests.get("https://kworb.net/spotify/artists.html")
df = pd.concat(pd.read_html(source.text, flavor="bs4"))
df.to_csv("artists.csv", index=False)

This outputs a .csv file with 10,000 artists.