Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove all instances of line breaks and tabs using Javascript

I am scraping a website and need to remove all the /n and /t from my strings.

I have tried the following code:

item.post_category = [];
 Array.from($doc.find('h6.link')).forEach(function(link){ 
            console.log(link.textContent.replace(/\t+\n+/gm, ""));        
            item.post_category.push(link.textContent);
          })
//this removes the linebreaks but not the tabs

Here are multiple sample array I have to iterate over:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

["\n\t\t\t\t\tJune 15, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]

["\n\t\t\t\t\tJune 13, 2020 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]

["\n\t\t\t\t\tJuly 5, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tNews\n\t\t\t\t"]

IDEALLY, I would want my arrays to look like this. Remove the date AND the \n and \t.

["Family,Gender Equality,In the News"]
["In the News"]
["News"]

>Solution :

There are hundreds of ways to do it, you could use a regex, or split, depending on your need.

Here is one of the possible solutions :

let str = "\n\t\t\t\t\tJune 15, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"

// Remove all new lines and tabs with a regex. You could also add '\r\n' if necessary.
str = str.replace(/(\n|\t)/gm, '');

// Here we assume that your string will 
// always contain the date followed by this character: •. 
// So we split according to this character, and we select 
// the second item of the table, which corresponds to the text without the date.
let result = str.split('•')[1].trim()

console.log(result) // prints 'Family,Gender Equality,In the News'
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading