Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Webscraping using cURL

i want to webscrape a certain thing of a website. I got it to work last time but now it seems the website has been updated, the previus source list looked like this

 <td><a href="javascript:void(0)" class="rankRow"
                                                                       data-rankkey="25">
                                                                                Averages
                                                                        </a>
                                                                </td>
                                                                <td class="page_speed_602217763">
                                                                        82.84                                                                        </td>

Where i want the number 82,84. I found the solution last time to be

      $curl = curl_init();
  curl_setopt($curl, CURLOPT_URL, $Player1Link);
  curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
  $curlresult = curl_exec($curl);

  $html= $curlresult;
  $domd = new DOMDocument();
  @$domd->loadHTML($html);
  $xp = new DOMXPath($domd);
  $P1Avg = $xp->query("//td[contains(@class, 'page_speed_')]")->item(0)->textContent;
  $P1Avg = trim($P1Avg);

Now the website has been updated and the source list looks like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

                                                                    <td><a 

href="javascript:void(0)" class="rankRow"
                                                                               data-rankkey="25">
                                                                                        Averages
                                                                                </a>
                                                                        </td>
                                                                        <td style="text-align:right;'">
                                                                                89.61                                                                        </td>

Where i still want the number (Average) in this case 89,61. How would i go about making the changes?
Thank you so much in advance

>Solution :

You might make use of the class="rankRow" in the anchor of the preceding td

//td/a[@class='rankRow']/parent::td/following-sibling::td[1]

For the given html in the question:

$html=<<<HTML
<td><a 

href="javascript:void(0)" class="rankRow"
                                                                               data-rankkey="25">
                                                                                        Averages
                                                                                </a>
                                                                        </td>
                                                                        <td style="text-align:right;'">
                                                                                89.61                                                                        </td>
HTML;

$domd = new DOMDocument();
@$domd->loadHTML($html);
$xp = new DOMXPath($domd);
$P1Avg = $xp->query("//td/a[@class='rankRow']/parent::td/following-sibling::td[1]")->item(0)->textContent;
$P1Avg = trim($P1Avg);
echo $P1Avg;

Output

89.61
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading