Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

File get content or cURL getting 404 page instead of main string

I was trying to get string from website but i am getting 404 page of external website instead of index page string.

I have tried with both cURL and file_get_contents. Both returning 404 from external website instead of returning the string of index page.

$homepage = file_get_contents("https://www.creditkarma.ca");
echo $homepage;

cURL :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';

function file_get_contents_curl($url) {
$ch = curl_init();

curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);   
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);   
curl_setopt($ch, CURLOPT_VERBOSE, true);    

$data = curl_exec($ch);
curl_close($ch);

return $data;
}
$homepage = file_get_contents_curl("https://www.creditkarma.ca");
echo $homepage;

The code should return the string of index page but it return the 404 page from external website. How can i solve this. i need a string of index page.

Note : it returning 404 of external website not from my .htaccess

>Solution :

With a CURL statement, if you want to retrieve the HTML of a page, you should be using headers. As a security precaution, a lot of websites will deny traffic (or result in 404) if browser information is not apparent. So when I do this .. I try to "emulate" my statement, as if it were a browser. Something like this should fit the bill — As noted in your updated code above, you are not denoting an "agent":

$url="https://www.creditkarma.ca";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
var_dump($result);

UPDATE

I have tested this as a "stand alone" php script .. And get the following results:

*   Trying 104.100.143.79:443...
* TCP_NODELAY set
* Connected to www.creditkarma.ca (104.100.143.79) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: businessCategory=Private Organization; jurisdictionC=US; jurisdictionST=Delaware; serialNumber=4313894; C=US; ST=California; L=San Francisco; O=Credit Karma Inc.; CN=www.creditkarma.ca
*  start date: Mar 16 00:00:00 2020 GMT
*  expire date: Mar 21 12:00:00 2022 GMT
*  subjectAltName: host "www.creditkarma.ca" matched cert's "www.creditkarma.ca"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
*  SSL certificate verify ok.
> GET / HTTP/1.1
Host: www.creditkarma.ca
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Accept: */*

* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< x-content-security-policy:
< Server: CK-FG-server
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< ORIGIN-ENV: production
< ORIGIN-DC: us-east4
< Expires: Wed, 12 Jan 2022 18:20:46 GMT
< Cache-Control: max-age=0, no-cache, no-store
< Pragma: no-cache
< Date: Wed, 12 Jan 2022 18:20:46 GMT
< Transfer-Encoding:  chunked
< Connection: keep-alive
< Connection: Transfer-Encoding
< Set-Cookie: ck_cabf=IjA5MTRmMDQ2LTE3OTAtNDQ5MC1hODA3LWUzZTRlZDcwYTdlYSI=; Max-Age=31536000; Expires=Thu, 12 Jan 2023 18:20:46 GMT; Secure; SameSite=Strict; Path=/
< Set-Cookie: ck_crumb=6da1442eb87cee1a6c0c08c56a9b07826949e3dc130925b0fcb774a83d566b71f5a9b634c4e4f198ae8dc4a6722abf41; Secure; HttpOnly; SameSite=Strict; Path=/
< Set-Cookie: ck_trace_id=5544f4ea-9d03-462b-ab5f-8a81c70c6c81; HttpOnly; SameSite=Strict; Path=/
< Set-Cookie: ck_lang=en; SameSite=Strict; Path=/
<
* Connection #0 to host www.creditkarma.ca left intact
string(63139) "<!DOCTYPE html>
<html>
    <head>
 ..... Rest of page here
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading