I would like to get all the school links from the table under Directory of charter schools in Ohio across all the next pages from this website using requests module. To access the school links from next pages, it is necessary to issue post requests with json params. However, I can only access the links of the first page using get requests.
When I attempt to go for post requests, I always get json.decoder.JSONDecodeError: probably because I didn’t include (couldn’t find it’s whereabouts) csrftoken both in headers and cookies.
This is how I’ve tried:
import requests
link = "https://www.causeiq.com/directory/charter-schools-list/ohio-state/"
headers = {
'referer': 'https://www.causeiq.com/directory/charter-schools-list/ohio-state/',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
params = {"filters":[],"pageNumber":"0","sortHow":"popularity","sortDir":"desc"}
with requests.Session() as s:
s.headers.update(headers)
resp = s.post(link,json=params)
print(resp.json())
How can I produce json response containing school links?
>Solution :
To get first 50 items you can use following example (to get more results the page suggests to use Cause IQ search interface):
import requests
import pandas as pd
link = "https://www.causeiq.com/directory/charter-schools-list/ohio-state/"
headers = {
"Referer": "https://www.causeiq.com/directory/charter-schools-list/ohio-state/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
"Origin": "https://www.causeiq.com",
"X-Requested-With": "XMLHttpRequest",
}
params = {
"filters": [],
"pageNumber": "0",
"sortHow": "popularity",
"sortDir": "desc",
}
with requests.Session() as s:
# load cookies
s.get(link)
# load csrftoken token
s.get(link, headers=headers)
headers["X-CSRFToken"] = s.cookies["csrftoken"]
orgs = []
for params["pageNumber"] in range(0, 5):
data = s.post(link, json=params, headers=headers).json()
orgs.extend(data["orgs"])
df = pd.DataFrame(orgs)
print(df.head(3).to_markdown(index=False))
Prints:
| url | name | fancy_ein | description | id | det_name | det_secondary_name | det_abbreviation | det_description | det_type | det_meta_prioritygroups | rev_total | det_employees | ass_total | det_ntee | det_primary_address | det_year_formed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| /organizations/e-prep-and-village-prep,262367774/ | E Prep and Village Prep | 26-2367774 | E Prep & Village Prep Cliffs Campus serves students in grades K-8, and welcomes students entering at any grade or ability level. The Prep Schools mission is to seek to provide a high quality, academically rigorous education for the college bound scholar. | 262367774 | E Prep and Village Prep | E Prep & Village Prep Cliffs Campus serves students in grades K-8, and welcomes students entering at any grade or ability level. The Prep Schools mission is to seek to provide a high quality, academically rigorous education for the college bound scholar. | 501(c)(3) | [‘PRIORITY_4’] | {‘one_year_growth’: 68.02840839287008, ‘year’: ‘2019’, ‘value’: 8657531, ‘series’: {‘2009’: 2345208, ‘2019’: 8657531, ‘2018’: 5152421, ‘2017’: 4941446, ‘2016’: 5178179, ‘2015’: 5335043, ‘2014’: 5052185, ‘2013’: 3763971, ‘2012’: 3724549, ‘2011’: 2623930, ‘2010’: 2123260}} | {‘one_year_growth’: 61.904761904761905, ‘year’: ‘2019’, ‘value’: 102, ‘series’: {‘2019’: 102, ‘2018’: 63, ‘2017’: 65, ‘2016’: 73, ‘2015’: 75, ‘2014’: 111, ‘2013’: 81, ‘2012’: 59, ‘2011’: 24, ‘2010’: 24}} | {‘one_year_growth’: 112.10802607191872, ‘year’: ‘2019’, ‘value’: 7795759, ‘series’: {‘2009’: 1883847, ‘2019’: 7795759, ‘2018’: 3675372, ‘2017’: 3347101, ‘2016’: 3102715, ‘2015’: 2516026, ‘2014’: 1545238, ‘2013’: 1323677, ‘2012’: 1797580, ‘2011’: 1520817, ‘2010’: 1441603}} | B29 | {‘zip’: ‘44114’, ‘city’: ‘Cleveland’, ‘phone’: ‘(216) 456-2070’, ‘metro’: ‘17460’, ‘street1’: ‘1415 E 36th St’, ‘street2’: None, ‘state’: ‘OH’, ’email’: ”} | 2008 | ||
| /organizations/horizon-science-academy-of-lorain,264574311/ | Horizon Science Academy of Lorain | 26-4574311 | Horizon Science Academy of Lorain is public charter school that serving grades K through 12 for children in Cleveland, OH. The school is a college prep school focusing on math, science and technology education. | 264574311 | Horizon Science Academy of Lorain | BECKY M SCHEIMAN | Horizon Science Academy of Lorain is public charter school that serving grades K through 12 for children in Cleveland, OH. The school is a college prep school focusing on math, science and technology education. | 501(c)(3) | [‘PRIORITY_4’] | {‘one_year_growth’: 4.405666781096662, ‘year’: ‘2019’, ‘value’: 7779711, ‘series’: {‘2009’: 1276506, ‘2019’: 7779711, ‘2018’: 7451426, ‘2016’: 6766792, ‘2015’: 6322058, ‘2014’: 5848132, ‘2013’: 5728648, ‘2012’: 3824747, ‘2011’: 3283336, ‘2010’: 1933082}} | {‘one_year_growth’: -9.259259259259256, ‘year’: ‘2019’, ‘value’: 98, ‘series’: {‘2009’: 18, ‘2019’: 98, ‘2018’: 108, ‘2016’: 95, ‘2015’: 102, ‘2014’: 86, ‘2013’: 74, ‘2011’: 43, ‘2010’: 43}} | {‘one_year_growth’: 18.03196205490267, ‘year’: ‘2019’, ‘value’: 4961529, ‘series’: {‘2009’: 152995, ‘2019’: 4961529, ‘2018’: 4203547, ‘2016’: 3266537, ‘2015’: 2040844, ‘2014’: 1026625, ‘2013’: 1025081, ‘2012’: 492032, ‘2011’: 506514}} | B29 | {‘zip’: ‘44052’, ‘city’: ‘Lorain’, ‘metro’: ‘17460’, ‘street1’: ‘760 Tower Blvd’, ‘street2’: None, ‘state’: ‘OH’} | 2009 | |
| /organizations/breakthrough-charter-schools,270362848/ | Breakthrough Charter Schools | 27-0362848 | Bcs’ mission is to provide sustainable, high-quality public Schools in cleveland’s under-served neighborhoods, ensuring all students have access to a public, free, outstanding college preparatory education. Our Schools have been honored by local, state,… | 270362848 | Breakthrough Charter Schools | Bcs’ mission is to provide sustainable, high-quality public Schools in cleveland’s under-served neighborhoods, ensuring all students have access to a public, free, outstanding college preparatory education. Our Schools have been honored by local, state,… | 501(c)(3) | [‘PRIORITY_4’] | {‘one_year_growth’: -28.839613493997206, ‘year’: ‘2018’, ‘value’: 6374546, ‘series’: {‘2018’: 6374546, ‘2017’: 8957998, ‘2016’: 8603837, ‘2015’: 8390198, ‘2014’: 5450157, ‘2013’: 4680359, ‘2012’: 5001355, ‘2011’: 2887621}} | {‘one_year_growth’: -4.81927710843374, ‘year’: ‘2018’, ‘value’: 79, ‘series’: {‘2018’: 79, ‘2017’: 83, ‘2016’: 80, ‘2015’: 53, ‘2014’: 43, ‘2013’: 35, ‘2012’: 34, ‘2011’: 34}} | {‘one_year_growth’: -44.85996246242701, ‘year’: ‘2018’, ‘value’: 817606, ‘series’: {‘2018’: 817606, ‘2017’: 1482781, ‘2016’: 2247379, ‘2015’: 2214294, ‘2014’: 1894363, ‘2013’: 1502679, ‘2012’: 1587906, ‘2011’: 449131}} | B29 | {‘zip’: ‘44114’, ‘city’: ‘Cleveland’, ‘metro’: ‘17460’, ‘street1’: ‘3615 Superior Ave Bldg 44 Ste 4403a’, ‘street2’: ”, ‘state’: ‘OH’} | 2009 |