groovy - regex to retrieve inner html tag -
i wanted try match inner part of string between span tags guaranteed id of span tags starts blk.
how can match groovy?
example :
<p>i wanted try <span id="blk1">match</span> inner part of string<span id="blk2"> between </span>the span tags <span>where</span> guaranteed id of span tags <span id="blk3">starts</span> blk.</p> according example above,i want have
match between starts i tried following , returns null;
def html='''<p>i wanted try <span id="blk1">match</span> inner part of string<span id="blk2"> between </span>the span tags <span>where</span> guaranteed id of span tags <span id="blk3">starts</span> blk.</p>''' html=html.findall(/<span id="blk(.)*">(.)*<\/span>/).join(); println html;
rather messing around regular expressions, why not parse html , extract nodes it?
@grab( 'net.sourceforge.nekohtml:nekohtml:1.9.18' ) import org.cyberneko.html.parsers.saxparser def html = '''<p> | wanted try <span id="blk1">match</span> inner part | of string<span id="blk2"> between </span> span tags <span>where</span> | guaranteed id of span tags <span id="blk3">starts</span> | blk. |</p>'''.stripmargin() def content = new xmlslurper( new saxparser() ).parsetext( html ) list<string> spans = content.'**'.findall { it.name() == 'span' && it.@id?.text()?.startswith( 'blk' ) }*.text()
Comments
Post a Comment