doctype - How to trigger Perl multiline substitution -
i have folder of html files have below doctype declaration need remove, not-very-good parser can load xml.
i've been trying use perl substitution in place, no change made when run substitution , can't figure out why. can identify correct flags or specification need make in order remove doctype processing instruction here.
here's example file i'd manipulate.
<!doctype html public "-//w3c//dtd xhtml 1.0 strict//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content= "html tidy linux/x86 (vers 25 march 2009), see www.w3.org" /> <title></title> </head> <body> </body> </html> here's perl one-liner i'm trying use, looks angle brackets, exclamation mark, , before close angle bracket. incorporates perl substitution flags other postings suggest should work multiline match - m multiline, s allowing newlines matched regex. i'm replacing match empty string.
perl -i -e 's/<![^>]+>//gsm' `find . -name '*.html'` i can't figure out why, doctype not removed file after running command. else know why?
what need -0777 switch cause entire file read single string. if not used, files read in line-by-line mode, , can never match multi-line statement way.
also, andomar points out, missing -p switch, assume figured out.
the modifiers on regex won't matter in case, except /g modifier. /m affects ^ , $, , /s causes wildcard . match newlines. none of applies regex.
so basically, want like:
perl -0777 -pi -e 's/<![^>]+>//g' ... side note:
html should handled parsers, ideally, spent few minutes working on using html::parser has convenient option strip declarations adding handler. seems print ok single file:
perl -mhtml::parser -we ' $p = html::parser->new(default_h => [sub {print @_},'text'] ); $p->handler(declaration => ''); $p->parse_file(shift) or die $!; " yourfile.html i figured overkill abandoned trying fix -pi in-place edit switches, (probably) implemented in script.
Comments
Post a Comment