Regex/wildcard replace on a string PHP -
i have mass of text gets loaded header, , within lies link.
<link rel="canonical" href="could_be_anything_here_at_all" />
i'm looking replace new value, href changes based on page meaning simple str_replace isn't possible.
i've looked @ using preg_replace, can't head around seems simple problem.
$regex = '/(^<link rel="canonical")(\/>$)/'; $match = preg_match_all($regex, $content, $matches); var_dump($matches);
- the / / start , end expression?
- the () indicate separate 'expressions' have matched string returned?
- the ^ filters results begin following string?
- the $ filters results end following string?
so i'm looking string begins <link rel="canonical"
, ends />
i've shown steps i'm after, , stab @ it. please me write , understand how it. i'm @ loss on one.
the regular expression you've written on place. let's go on pattern:
whatever happens, begin <link
, end ></link>
or />
(gotta account pesky non-respecting-of-standards web buccaneers). you're looking rel parameter, if has one, , needs canonical.
we can start writing regular expression: #<link([^>]+)(/>|></link>)#is
. map link
tags. can parse parameters using simple strpos
calls.
if sure rel="canonical" first parameter of link tag, can expand regular expression further #<link rel="canonical" href="?'?([^"']+)"?'?(/>|></link>)#is
. map in order, fine if sure order.
in order of appearance:
[^>]+
matches >
character 1 or more times
the is
flags stand for: case-insensitive, not break on newline
"?'?
matches 0 or 1 ", followed 0 or 1 '
if else unclear, let me know.
edit: answer questions
the / / start , end expression? they're called delimiters, , "encase" expression. perl regular expression engine allows flags set regarding expression (i, s, g, b, etc), , have out of expression. go after delimiter - , point of delimiter. can use character - pick furthest 2 repeating ones. people tend use / due js using single char them - tend prefer # in php clear / ambiguities arising closing html tags.
the () indicate separate 'expressions' have matched string returned? () matches subset , allows in results if specify variable matches. every part of regular expression can use wildcards & co, stuff encased in () returned in matches
- the ^ filters results begin following string? nope. ^ outside [] range match starts following string full stop. on new line, effectively, not "words".
- the $ filters results end following string? same above, "end" rather "start".
Comments
Post a Comment