Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scraping websites with PHP

I’m trying to scrap information directly from the maersk website.
Exemple, i’m trying scraping the information from this URL https://www.maersk.com/tracking/221242675
I Have a lot of tracking nunbers to update every day on database, so I dicided automate a little bit.

But, if have the following code, but its saying that need JS to work. I alredy even tryed with curl, etc.
But nothing work. Any one know another way?

I tryed the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


<?php
// ------------ teste 14 ------------
$html = file_get_contents('https://www.maersk.com/tracking/#tracking/221242675'); //get the html returned from the following url
echo $html;
$ETAupdate = new DOMDocument();

libxml_use_internal_errors(TRUE); //disable libxml errors

if(!empty($html)){ //if any html is actually returned

    $ETAupdate->loadHTML($html);
    libxml_clear_errors(); //remove errors for yucky html
    
    $ETA_xpath = new DOMXPath($ETAupdate);

    //get all the h2's with an id
    $ETA_row = $ETA_xpath->query('//strong');

    if($ETA_row->length > 0){
        foreach($ETA_row as $row){
            echo $row->nodeValue . "<br/>";
        }
    }
}
?>

>Solution :

You need to scrape the data directly from their API requests, rather than trying to scrape the page URL directly (Unless you’re using something like puppeteer, but I really don’t recommend that for this simple task)

I took a look at the site and the API endpoint is:

https://api.maersk.com/track/221242675?operator=MAEU

This will return a JSON-formatted response which you can parse and use to extract the details. It’ll also give you a much easier method to access the data rather than parsing the HTML. Example below.

{
    "tpdoc_num": "221242675",
    "isContainerSearch": false,
    "origin": {
        "terminal": "YanTian Intl. Container Terminal",
        "geo_site": "1PVA2R05ZGGHQ",
        "city": "Yantian",
        "state": "Guangdong",
        "country": "China",
        "country_code": "CN",
        "geoid_city": "0L3DBFFJ3KZ9A",
        "site_type": "TERMINAL"
    },
    "destination": {
        "terminal": "DCT Gdansk sa",
        "geo_site": "02RB4MMG6P32M",
        "city": "Gdansk",
        "state": "",
        "country": "Poland",
        "country_code": "PL",
        "geoid_city": "3RIGHAIZMGKN3",
        "site_type": "TERMINAL"
    },
    "containers": [ ... ]
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading