groovy - regex to retrieve inner html tag -


i wanted try match inner part of string between span tags guaranteed id of span tags starts blk.

how can match groovy?

example :

<p>i wanted try <span id="blk1">match</span> inner part of string<span id="blk2"> between </span>the span tags <span>where</span> guaranteed id of span tags <span id="blk3">starts</span> blk.</p> 

according example above,i want have

   match    between    starts 

i tried following , returns null;

 def html='''<p>i wanted try <span id="blk1">match</span> inner part of string<span id="blk2"> between </span>the span tags <span>where</span> guaranteed id of span tags <span id="blk3">starts</span> blk.</p>'''    html=html.findall(/<span id="blk(.)*">(.)*<\/span>/).join();  println html; 

rather messing around regular expressions, why not parse html , extract nodes it?

@grab( 'net.sourceforge.nekohtml:nekohtml:1.9.18' ) import org.cyberneko.html.parsers.saxparser  def html = '''<p>              |  wanted try <span id="blk1">match</span> inner part              |  of string<span id="blk2"> between </span> span tags <span>where</span>              |  guaranteed id of span tags <span id="blk3">starts</span>              |  blk.              |</p>'''.stripmargin()  def content = new xmlslurper( new saxparser() ).parsetext( html )  list<string> spans = content.'**'.findall { it.name() == 'span' && it.@id?.text()?.startswith( 'blk' ) }*.text() 

Comments

Popular posts from this blog

php - Why I am getting the Error "Commands out of sync; you can't run this command now" -

linux - Does gcc have any options to add version info in ELF binary file? -

java - Are there any classes that implement javax.persistence.Parameter<T>? -