Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How get google.com web page using C socket

I wrote code that should query the google.com web page and display its contents, but it doesn’t work as intended.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

int main()
{
    int sockfd;
    struct sockaddr_in destAddr;

    if((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1){
        fprintf(stderr, "Error opening client socket\n");
        close(sockfd);
        return;
    }

    destAddr.sin_family = PF_INET;
    destAddr.sin_port = htons(80);
    destAddr.sin_addr.s_addr = inet_addr("64.233.164.94");
    memset(&(destAddr.sin_zero), 0, 8);

    if(connect(sockfd, (struct sockaddr *)&destAddr, sizeof(struct sockaddr)) == -1){
        fprintf(stderr, "Error with client connecting to server\n");
        close(sockfd);
        return;
    }

    char *httprequest1 = "GET / HTTP/1.1\r\n"
        "Host: google.com\r\n"
        "\r\n";

    char *httprequest2 = "GET / HTTP/1.1\r\n"
        "Host: http://www.google.com/\r\n"
        "\r\n";

    char *httprequest3 = "GET / HTTP/1.1\r\n"
        "Host: http://www.google.com/\r\n"
        "Upgrade-Insecure-Requests: 1\r\n"
        "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\n"
        "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\r\n"
        "\r\n";

    char *httprequest = httprequest2;
   
    printf("start send\n");
    int send_result = send(sockfd, httprequest, strlen(httprequest), 0);
    printf("send_result: %d\n", send_result);

    #define bufsize 1000
    char buf[bufsize + 1] = {0};

    printf("start recv\n");
    int bytes_readed = recv(sockfd, buf, bufsize, 0);
    printf("end recv: readed %d bytes\n", bytes_readed);

    buf[bufsize] = '\0';
    printf("-- buf:\n");
    puts(buf);
    printf("--\n");


    return 0;
}

If I send httprequest1, I get this output:

gcc -w -o get-google get-google.c
./get-google
start send
send_result: 36
start recv
end recv: readed 528 bytes
-- buf:
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:52:16 GMT
Expires: Sun, 09 Oct 2022 11:52:16 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

--

In httprequest2, I specified the parameter Host: and I got the following this output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

gcc -w -o get-google get-google.c
./get-google
start send
send_result: 48
start recv
end recv: readed 198 bytes
-- buf:
HTTP/1.1 400 Bad Request
Content-Length: 54
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:53:19 GMT
Connection: close

<html><title>Error 400 (Bad Request)!!1</title></html>
--

Then I try copy headers from browser and after httprequest3 I got same result as for httprequest2.

How can I get the full page?

>Solution :

It should be Host: www.google.com and not Host: http://www.google.com/

However, it might not give you the home page. Google wants you to use HTTPS, so it’ll probably redirect you to https://www.google.com/ and you won’t be able to implement HTTPS fully yourself (you’ll have to use a library like OpenSSL)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading