Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get just the node text, not the children's text in puppeteer

Presume the website layout looks like this:

<div id="monday">
  ...
  <div class="dish">
    Potato Soup
    <br>
    <span>With smoked tofu</span>
  </div>
</div>

How, using puppeteer, would I be able to grab just the text node’s content, not everything inside .dish?

I’ve tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

let selector = await page.waitForSelector("#monday .dish");
let text = await selector.evaluate(el => el.textContent) ?? "";

but that returns "Potato SoupWith smoked tofu"

>Solution :

textContent is meant for that. What you can do is select the first TEXTNODE like below :

let text = await selector.evaluate(el => Array.from(el.childNodes)
                               .find(node=> node.nodeType === 3)?.textContent)

nodeType === 3 means it’s a text node. or you can use nodeName === '#text'

const elem = document.querySelector("#monday .dish");

const textNode = Array.from(elem.childNodes).find(r=> r.nodeType === 3)?.textContent;

console.log(textNode)
<div id="monday">
  <div class="dish">
    Potato Soup
    <br>
    <span>With smoked tofu</span>
  </div>
</div>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading