Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

For loop stops after returning results of 1st element

I have the following scraping script. I need to loop through many links which differ by T_ID’s included in data dictionary. The script is printing the result only for the first T_ID. Any idea how to improve this loop so it prints results for all T_ID’s?

import requests  
import json
import csv
import sys

from bs4 import BeautifulSoup

data = {'T_ID': [3396750, 3396753, 3396755, 3396757, 3396759]}

base_url = "XXXX"  
username = "XXXX"  
password = "XXXX"
toget = data

allowed_results = 50  
max_results = "maxResults=" + str(allowed_results)
tc = "/tcyc?"

result_count = -1  
start_index = 0  

df = pd.DataFrame(
   columns=['id', 'name', 'gId', 'dKey', 'tPlan'])

for eachId in toget['T_ID']:
    while result_count != 0:  
        start_at = "startAt=" + str(start_index)
        url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'  
        response = requests.get(url, auth=(username, password))  
        json_response = json.loads(response.text)
        print(json_response)
        page_info = json_response["meta"]["pageInfo"]
        start_index = page_info["startIndex"] + allowed_results  
        result_count = page_info["resultCount"]
        items2 = json_response["data"]
        print(items2)

        for item in items2:
            new_item = {'id': item['id'], **item['fields']}
            df = df.append(new_item, ignore_index=True)
            print (item["id"])
            print (item["project"])
            print (item["fields"]["name"])
            print (item["fields"]["gId"])
            print (item["fields"]["dKey"])
            print (item["fields"]["tPlan"])

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

It doesn’t stop, it actually runs all the way through. The issue is the start_index after it iterates through the first eachId is no longer 0. So when it gets to the next id, it’s looking at something like:

`'XXXX.com/3396753/tcyc?&startAt=123&maxResults=50'`

And then likely returning a result_count of 0, which means the while loop doesn’t run. Then it goes to the next id, and the same thing occurs.

Move your initial result_count = -1 and start_index = 0 within the loop, before the while. As you’d want those to "reset" for each 'T_ID':

import pandas as pd
import requests  
import json
import csv
import sys

from bs4 import BeautifulSoup

data = {'T_ID': [3396750, 3396753, 3396755, 3396757, 3396759]}

base_url = "XXXX"  
username = "XXXX"  
password = "XXXX"
toget = data

allowed_results = 50  
max_results = "maxResults=" + str(allowed_results)
tc = "/tcyc?"




df = pd.DataFrame(
   columns=['id', 'name', 'gId', 'dKey', 'tPlan'])

for eachId in toget['T_ID']:
    start_index = 0  
    result_count = -1  
    while result_count != 0:  
        start_at = "startAt=" + str(start_index)
        url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'  
        response = requests.get(url, auth=(username, password))  
        json_response = json.loads(response.text)
        print(json_response)
        page_info = json_response["meta"]["pageInfo"]
        start_index = page_info["startIndex"] + allowed_results  
        result_count = page_info["resultCount"]
        items2 = json_response["data"]
        print(items2)

        for item in items2:
            new_item = {'id': item['id'], **item['fields']}
            df = df.append(new_item, ignore_index=True)
            print (item["id"])
            print (item["project"])
            print (item["fields"]["name"])
            print (item["fields"]["gId"])
            print (item["fields"]["dKey"])
            print (item["fields"]["tPlan"])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading