Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Nokogiri to Find All Data Attrabutes Using a Wildcard

I’d like to strip all the data attributes from img tags while looping through a document. I’ve tried a few options using has_attribute? and xpath, none have returned true.

article.css('img').each do |img|
  # There is a `data` element
  img.has_attribute?("data-lazy-srcset") # true
  # But I only get `false` or empty arrays when trying wildcards
  img.has_attribute?('data-*') # false
  img.has_attribute?("//*[@*[contains(., 'data-')]]") # false
  img.has_attribute?("//*[contains(., 'data-')]") # false
  img.has_attribute?("//@*[starts-with(name(), 'data-')]") # false
  img.xpath("//*[@*[contains(., 'data-')]]") # []
  img.xpath("//*[contains(., 'data-')]") # []
end

How do I select all data- attributes on these img tags?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can search for img tags with an attribute that starts with "data-" using the following:

//img[@*[starts-with(name(),'data-')]]

To break this down:

  • // – Anywhere in the document
  • img – img tag
  • @* – All Attributes
  • starts-with(name(),’data-‘) – Attribute’s name starts with "data-"

Example:

require 'nokogiri'

doc = Nokogiri::HTML(<<-END_OF_HTML)
  <img src='' />
  <img data-method='a' src= ''> 
  <img data-info='b' src= ''> 
  <img data-type='c' src= ''> 
  <img src= ''> 
END_OF_HTML

imgs = doc.xpath("//img[@*[starts-with(name(),'data-')]]")

puts imgs 
# <img data-method="a" src="">
# <img data-info="b" src="">
# <img data-type="c" src="">

or using your desired loop

doc.css('img').select do |img|
  img.xpath(".//@*[starts-with(name(),'data-')]").any?
end
#[#<Nokogiri::XML::Element:0x384 name="img" attributes=[#<Nokogiri::XML::Attr:0x35c name="data-method" value="a">, #<Nokogiri::XML::Attr:0x370 name="src">]>, 
# #<Nokogiri::XML::Element:0x3c0 name="img" attributes=[#<Nokogiri::XML::Attr:0x398 name="data-info" value="b">, #<Nokogiri::XML::Attr:0x3ac name="src">]>, 
# #<Nokogiri::XML::Element:0x3fc name="img" attributes=[#<Nokogiri::XML::Attr:0x3d4 name="data-type" value="c">, #<Nokogiri::XML::Attr:0x3e8 name="src">]>]

UPDATE To remove the attributes:

doc.css('img').each do |img|
  img.xpath(".//@*[starts-with(name(),'data-')]").each(&:remove)
end

puts doc.to_s
#<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" #\"http://www.w3.org/TR/REC-html40/loose.dtd\">
#<html>
#<body>
#    <img src=\"\">  
#    <img src=\"\">  
#    <img src=\"\">  
#    <img src=\"\">  
#    <img src=\"\">
#</body>
#</html>

This can be simplified to doc.xpath("//img/@*[starts-with(name(),'data-')]").each(&:remove)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading