Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

When scraping data, I get only first result despite using Array.from

I want to scrap data from a webpage. Here’s the code I have. It is supposed to get all the authors, but it only gets a first one (‘Simon Butler’).

Array.from(document.querySelectorAll('#author-group'))
  .map(e =>
      e.querySelector('[class="button-link workspace-trigger button-link-primary"]'))
  .map(e =>
      e.querySelector('[class="button-link-text"]'))
  .map(e =>
      e.querySelector('[class="react-xocs-alternative-link"]'))
  .map(e =>
      e.querySelector('[class="given-name"]').textContent + ' '
      + e.querySelector('[class="text surname"]').textContent)
  .join(', ')

As I see it, the error is from using querySelector as it gets the first element. However, when I use querySelectorAll I get the following error: e.querySelectorAll is not a function.

I want to scrap data from https://www.sciencedirect.com/science/article/pii/S0164121219302262.
I didn’t give any HTML code as the source HTML is really huge when it comes to a portion of authors information. I’m not familiar enough with HTML nor JS to give a minimal sample of HTML code.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Array.from(document.querySelectorAll('#author-group'))

This creates an array with one element in it.


The code you provided used querySelector which only returned one item (which is what you said you weren’t looking for) but you said you tried with querySelectorAll.

.map(e =>
      e.querySelectorAll('[class="button-link workspace-trigger button-link-primary"]'))

Since the previous step returned an array with an element in it, e is an element.

Elements have querySelectorAll so this is fine.

However, now you are returning a NodeList, not an Element.


.map(e =>
e.querySelectorAll(‘[class="button-link-text"]’))

Now e is a NodeList. It isn’t an Element.

NodeLists don’t have querySelector or querySelectorAll methods.

You need to loop over the NodeList (perhaps with a map) and deal with each element one by one.


Probably what you should be doing is calling querySelectorAll once and using descendant combinators to describe the elements containing each author in a single query.

Then you would be able to:

Array.from(document.querySelectorAll('#author-group etc etc etc'))
    .map(e =>  
        e.querySelector('[class="given-name"]').textContent
        + ' '
        + e.querySelector('[class="text surname"]').textContent
    )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading